Compositions and methods for detecting rare sequence variants

ABSTRACT

In some aspects, the present disclosure provides methods for identifying sequence variants in a nucleic acid sample. In some embodiments, a method comprises identifying sequence differences between sequencing reads and a reference sequence, and calling a sequence difference that occurs in at least two different circular polynucleotides, such as two circular polynucleotides having different junctions, or two different sheared polynucleotides as the sequence variant. In some aspects, the present disclosure provides compositions and systems useful in the described method.

CROSS-REFERENCE

This application is a continuation of U.S. patent application Ser. No.16/896,073, filed Jun. 8, 2020, which is a continuation of U.S. patentapplication Ser. No. 16/172,310, filed Oct. 26, 2018, now U.S. Pat. No.10,724,088, issued on Jul. 28, 2020, which is a continuation of U.S.patent application Ser. No. 15/800,558, filed Nov. 1, 2017, now U.S.Pat. No. 10,155,980, issued on Dec. 18, 2018, which is a continuationapplication of International Patent Application No. PCT/US2017/047029,filed Aug. 15, 2017, which claims the benefit of U.S. Provisional PatentApplication No. 62/375,396, filed Aug. 15, 2016, each of which areincorporated by reference in their entirety.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in XML format and is hereby incorporated byreference in its entirety. Said XML copy, created on Mar. 28, 2023, isnamed 47608707304_SL.xml and is 226,380 bytes in size.

BACKGROUND

Identifying sequence variation within complex populations is an activelygrowing field, particularly with the advent of large scale parallelnucleic acid sequencing. However, large scale parallel sequencing hassignificant limitations in that the inherent error frequency incommonly-used techniques is larger than the frequency of many of theactual sequence variations in the population. For example, error ratesof 0.1-1% have been reported in standard high throughput sequencing.Detection of rare sequence variants has high false positive rates whenthe frequency of variants is low, such as at or below the error rate.

There often exists a pressing need for detecting rare sequence variants.For example, detecting rare characteristic sequences can be used toidentify and distinguish the presence of a harmful environmentalcontaminant, such as bacterial taxa. A common way of characterizingbacterial taxa is to identify differences in a highly conservedsequence, such as rRNA sequences. However, typical sequencing-basedapproaches to this are faced with challenges relating to the sheernumber of different genomes in a given sample and the degree of homologybetween members, presenting a complex problem for already laboriousprocedures. Improved procedures would have the potential to enhancecontamination detection in a variety of settings. For example, the cleanrooms used to assemble components of satellites and other space craftcan be surveyed with the present systems and methods to understand whatmicrobial communities are present and to develop better decontaminationand cleaning techniques to prevent the introduction of terrestrialmicrobes to other planets or samples thereof or to develop methodologiesto distinguish data generated by putative extraterrestrialmicroorganisms from that generated by contaminating terrestrialmicroorganisms. Food monitoring applications include the periodictesting of production lines at food processing plants, surveyingslaughter houses, inspecting the kitchens and food storage areas ofrestaurants, hospitals, schools, correctional facilities and otherinstitutions for food borne pathogens. Water reserves and processingplants may also be similarly monitored.

Rare variant detection can also be important for the early detection ofpathological mutations. For instance, detection of cancer-associatedpoint mutations in clinical samples can improve the identification ofminimal residual disease during chemotherapy and detect the appearanceof tumor cells in relapsing patients. The detection of rare pointmutations is also important for the assessment of exposure toenvironmental mutagens, to monitor endogenous DNA repair, and to studythe accumulation of somatic mutations in aging individuals.Additionally, more sensitive methods to detect rare variants can enhanceprenatal diagnosis, enabling the characterization of fetal cells presentin maternal blood.

SUMMARY

In view of the foregoing, there is a need for improved methods ofdetecting rare sequence variants. The compositions and methods of thepresent disclosure address this need, and provide additional advantagesas well. In particular, the various aspects of the disclosure providefor highly sensitive detection of rare or low frequency nucleic acidsequence variants (sometimes referred to as mutations). This includesidentification and elucidation of low frequency nucleic acid variations(including substitutions, insertions and deletions) in samples that maycontain low amounts of variant sequences in a background of normalsequences, as well as the identification of low frequency variations ina background of sequencing errors.

In one aspect, the disclosure provides a method of performing rollingcircle amplification, the method comprising (a) circularizing individualpolynucleotides in a plurality of polynucleotides to form a plurality ofcircular polynucleotides using a ligase enzyme, wherein eachpolynucleotide of the plurality of polynucleotides has a 5′ end and a 3′end prior to ligation; (b) degrading the ligase enzyme; and (c)amplifying the circular polynucleotides after degrading the ligaseenzyme to produce amplified polynucleotides; wherein polynucleotides arenot purified or isolated between steps (a) and (c). In some embodiments,the method further comprises degrading linear polynucleotides betweensteps (a) and (c). In some embodiments, the plurality of polynucleotidescomprises single-stranded polynucleotides. In some embodiments, anindividual circular polynucleotide has a junction that is distinct amongthe circularized polynucleotides. In some embodiments, circularizingcomprises the step of joining an adapter polynucleotide to the 5′ end,the 3′ end, or both the 5′ end and the 3′ end of a polynucleotide in theplurality of polynucleotides. In some embodiments, amplifying comprisessubjecting the circular polynucleotides to an amplification reactionmixture comprising random primers. In some embodiments, amplifyingcomprises subjecting the circular polynucleotides to an amplificationreaction mixture comprising one or more primers, each of whichspecifically hybridizes to a different target sequence via sequencecomplementarity. In some embodiments, the sample is a sample from asubject. In some embodiments, the sample is urine, stool, blood, saliva,tissue, or bodily fluid. In some embodiments, the sample comprises tumorcells. In some embodiments, the sample is a formalin-fixed paraffinembedded (FFPE) sample. In some embodiments, the method furthercomprises diagnosing, and optionally treating, said subject based on thecalling step. In some embodiments, the sequence variant is a causalgenetic variant. In some embodiments, the sequence variant is associatedwith a type or stage of cancer. In some embodiments, the plurality ofpolynucleotides comprises cell-free polynucleotides. In someembodiments, the cell-free polynucleotides comprise circulating tumorDNA. In some embodiments, the cell-free polynucleotides comprisecirculating tumor RNA. In some embodiments, the method further comprisessequencing the amplified polynucleotides to produce a plurality ofsequencing reads. In some embodiments, the method further comprisesidentifying sequence differences between the sequencing reads and areference sequence. In some embodiments, the method further comprisescalling a sequence difference as a sequence variant in the plurality ofpolynucleotides only when: (i) the sequence difference is identified onboth strands of a double-stranded input molecule; (ii) the sequencedifference occurs in a consensus sequence for a concatemer formed byrolling circle amplification; and/or (iii) the sequence differenceoccurs in two different molecules. In some embodiments, a sequencedifference is identified as occurring in two different molecules whenthe sequence difference occurs in at least two circular polynucleotideshaving a different junction formed between the 5′ end and 3′ end. Insome embodiments, a sequence difference is identified as occurring intwo different molecules when reads corresponding to the two differentmolecules have a different 5′ end and a different 3′ end.

In another aspect, the disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end, the method comprising: (a) circularizing individualpolynucleotides of said plurality to form a plurality of circularpolynucleotides, each of which having a junction between the 5′ end and3′ end; (b) amplifying the circular polynucleotides of (a) to produceamplified polynucleotides; (c) shearing the amplified polynucleotides toproduce sheared polynucleotides, each sheared polynucleotide comprisingone or more shear points at a 5′ end and/or a 3′ end; (d) sequencing thesheared polynucleotides to produce a plurality of sequencing reads; (e)identifying sequence differences between sequencing reads and areference sequence; and (f) calling a sequence difference as thesequence variant when the sequence difference occurs in at least twodifferent sheared polynucleotides. In some embodiments, calling thesequence difference as the sequence variant occurs further when (i) thesequence difference occurs in at least two circular polynucleotideshaving different junctions; (ii) the sequence difference is identifiedon both strands of a double-stranded input molecule; and/or (iii) thesequence difference occurs in a consensus sequence for a concatemerformed by amplification comprising rolling circle amplification. In someembodiments, the plurality of polynucleotides comprises single-strandedpolynucleotides. In some embodiments, circularizing is effected bysubjecting the plurality of polynucleotides to a ligation reaction. Insome embodiments, the sequence variant is a single nucleotidepolymorphism. In some embodiments, the reference sequence is a consensussequence formed by aligning the sequencing reads with one another. Insome embodiments, the reference sequence is a sequencing read. In someembodiments, circularizing comprises the step of joining an adapterpolynucleotide to the 5′ end, the 3′ end, or both the 5′ end and the 3′end of a polynucleotide in the plurality of polynucleotides. In someembodiments, amplifying is effected by using a polymerase havingstrand-displacement activity. In some embodiments, amplifying comprisessubjecting the circular polynucleotides to an amplification reactionmixture comprising random primers. In some embodiments, amplifyingcomprises subjecting the circular polynucleotides to an amplificationreaction mixture comprising one or more primers, each of whichspecifically hybridizes to a different target sequence via sequencecomplementarity. In some embodiments, the amplified polynucleotides aresubjected to the sequencing step without enrichment. In someembodiments, the method further comprises enriching one or more targetpolynucleotides among the amplified polynucleotides by performing anenrichment step prior to sequencing. In some embodiments, a microbialcontaminant is identified based on the calling step. In someembodiments, the sample is a sample from a subject. In some embodiments,the sample is urine, stool, blood, saliva, tissue, or bodily fluid. Insome embodiments, the sample comprises tumor cells. In some embodiments,the sample is a formalin-fixed paraffin embedded (FFPE) sample. In someembodiments, the method further comprises diagnosing, and optionallytreating, said subject based on the calling step. In some embodiments,the sequence variant is a causal genetic variant. In some embodiments,the sequence variant is associated with a type or stage of cancer. Insome embodiments, the plurality of polynucleotides comprises cell-freepolynucleotides. In some embodiments, the cell-free polynucleotidescomprise circulating tumor DNA. FIG. 14 provides an illustrativeschematic of an exemplary work-flow.

In another aspect, the disclosure provides a reaction mixture forperforming a method according to any of the methods herein, wherein thereaction mixture comprises (a) a plurality of concatemers, whereinindividual concatemers in the plurality comprise different junctionsformed by circularizing individual polynucleotides having a 5′ end and a3′ end; (b) a first primer comprising sequence A′, wherein the firstprimer specifically hybridizes to sequence A of the target sequence viasequence complementarity between sequence A and sequence A′; (c) asecond primer comprising sequence B, wherein the second primerspecifically hybridizes to sequence B′ present in a complementarypolynucleotide comprising a complement of the target sequence viasequence complementarity between B and B′; and; (d) a polymerase thatextends the first primer and the second primer to produce amplifiedpolynucleotides; wherein the distance between the 5′ end of sequence Aand the 3′ end of sequence B of the target sequence is 75 nt or less. Insome embodiments, the first primer comprises sequence C 5′ with respectto sequence A′, the second primer comprises sequence D 5′ with respectto sequence B, and neither sequence C nor sequence D hybridizes to thetwo or more concatemers during a first amplification step in anamplification reaction.

In another aspect, the disclosure provides a system for detecting asequence variant comprising (a) a computer configured to receive a userrequest to perform a detection reaction on a sample; (b) anamplification system that performs a nucleic acid amplification reactionon the sample or a portion thereof in response to the user request,wherein the amplification reaction comprises the steps of (i)circularizing individual polynucleotides in a plurality ofpolynucleotides to form a plurality of circular polynucleotides using aligase enzyme, wherein each polynucleotide of the plurality ofpolynucleotides has 5′ end and 3′ end prior to ligation; (ii) degradingthe ligase enzyme; and (iii) amplifying the circular polynucleotidesafter degrading the ligase enzyme to produce amplified polynucleotides;wherein polynucleotides are not purified or isolated between steps (i)and (iii); (c) a sequencing system that generates sequencing reads forpolynucleotides amplified by the amplification system, identifiessequence differences between sequencing reads and a reference sequence,and calls a sequence difference that occurs in at least two circularpolynucleotides having different junctions as the sequence variant; and(d) a report generator that sends a report to a recipient, wherein thereport contains results for detection of the sequence variant. In someembodiments, the recipient is the user.

In another aspect, the disclosure provides a computer-readable mediumcomprising codes that, upon execution by one or more processors,implement a method of detecting a sequence variant, the implementedmethod comprising: (a) receiving a customer request to perform adetection reaction on a sample; (b) performing a nucleic acidamplification reaction on the sample or a portion thereof in response tothe customer request, wherein the amplification reaction comprises thesteps of (i) circularizing individual polynucleotides in a plurality ofpolynucleotides to form a plurality of circular polynucleotides using aligase enzyme, wherein each polynucleotide of the plurality ofpolynucleotides has a 5′ end and 3′ end prior to ligation; (ii)degrading the ligase enzyme; and (iii) amplifying the circularpolynucleotides after degrading the ligase enzyme to produce amplifiedpolynucleotides; wherein polynucleotides are not purified or isolatedbetween steps (i) and (iii); (c) performing a sequencing analysiscomprising the steps of (i) generating sequencing reads forpolynucleotides amplified in the amplification reaction; (ii)identifying sequence differences between sequencing reads and areference sequence; and (iii) calling a sequence difference that occursin at least two circular polynucleotides having different junctions asthe sequence variant; and (d) generating a report that contains resultsfor detection of the sequence variant.

In another aspect, the disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end, the method comprising: (a) circularizing individualpolynucleotides of said plurality to form a plurality of circularpolynucleotides, each of which having a junction between the 5′ end and3′ end; (b) degrading the ligase enzyme; (c) amplifying the circularpolynucleotides of (a) using random primers to produce amplifiedpolynucleotides; (d) shearing the amplified polynucleotides; (e)sequencing the sheared polynucleotides to produce a plurality ofsequencing reads; (f) identifying sequence differences betweensequencing reads and a reference sequence; and (g) calling a sequencedifference as the sequence variant when: (i) the sequence difference isidentified on both strands of a double-stranded input molecule; (ii) thesequence difference occurs in a consensus sequence for a concatemerformed by rolling circle amplification; (iii) calling a sequencedifference as the sequence variant when the sequence difference occursin at least two different sheared polynucleotides; and/or (iv) thesequence difference occurs in two different molecules; wherein asequence difference is identified as occurring in two differentmolecules when the sequence difference occurs in at least two circularpolynucleotides having a different junction formed between the 5′ endand 3′ end.

In another aspect, the disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end, the method comprising: (a) circularizing individualpolynucleotides of said plurality to form a plurality of circularpolynucleotides, each of which having a junction between the 5′ end and3′ end; (b) degrading the ligase enzyme; (c) amplifying the circularpolynucleotides of (a) using one or more primers, each of whichspecifically hybridizes to a different target sequence via sequencecomplementarity to produce amplified polynucleotides; (d) shearing theamplified polynucleotides; (e) sequencing the sheared polynucleotides toproduce a plurality of sequencing reads; (f) identifying sequencedifferences between sequencing reads and a reference sequence; and (g)calling a sequence difference as the sequence variant when: (i) thesequence difference is identified on both strands of a double-strandedinput molecule; (ii) the sequence difference occurs in a consensussequence for a concatemer formed by rolling circle amplification; and/or(iii) the sequence difference occurs in two different molecules; whereina sequence difference is identified as occurring in two differentmolecules when the sequence difference occurs in at least two circularpolynucleotides having a different junction formed between the 5′ endand 3′ end.

In an aspect, the disclosure provides a method of identifying a sequencevariant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end, the method comprising (a) circularizing individualpolynucleotides of the plurality to form a plurality of circularpolynucleotides, wherein a given circular polynucleotide of theplurality has a junction sequence resulting from the circularization;(b) amplifying the circular polynucleotides of (a) to produce aplurality of amplified polynucleotides, wherein a first amplifiedpolynucleotide of the plurality and a second amplified polynucleotide ofthe plurality comprise the junction sequence but comprise differentsequences at their respective 5′ and/or 3′ ends; (c) sequencing theplurality of amplified polynucleotides or amplification products thereofto produce a plurality of sequencing reads corresponding to the firstamplified polynucleotide and the second amplified polynucleotide; and(d) calling a sequence difference detected in the sequencing reads asthe sequence variant when the sequence difference occurs in sequencingreads corresponding to both the first amplified polynucleotide and thesecond amplified polynucleotide.

In some embodiments, circularizing individual polynucleotides in (a) iseffected by a ligase enzyme. In some embodiments, prior to (b), theligase enzyme is degraded. In some embodiments, uncircularizedpolynucleotides are degraded prior to (b). In some embodiments, theplurality of circular polynucleotides is not purified or isolated priorto (b).

In some embodiments, circularizing in (a) comprises the step of joiningan adapter polynucleotide to the 5′ end, the 3′ end, or both the 5′ endand the 3′ end of a polynucleotide in the plurality of polynucleotides.

In some embodiments, amplifying the circular polynucleotides in (b) iseffected by a polymerase having strand-displacement activity. In someembodiments, amplifying the circular polynucleotides in (b) comprisesrolling circle amplification (RCA). In some embodiments, amplifying in(b) comprises subjecting the circular polynucleotides to anamplification reaction mixture comprising random primers. In someembodiments, individual random primers comprise sequences at theirrespective 5′ and/or 3 ends distinct from each other. In someembodiments, amplifying in (b) comprises subjecting the circularpolynucleotides to an amplification reaction mixture comprising targetspecific primers. In some embodiments, amplifying comprises multiplecycles of denaturation, primer binding, and primer extension.

In some embodiments, the amplified polynucleotides are subjected to thesequencing of (c) without enrichment. In some embodiments, the methodfurther comprises enriching one or more target polynucleotides among theamplified polynucleotides or amplification products thereof byperforming an enrichment step prior to the sequencing of (c).

In some embodiments, the plurality of polynucleotides comprisessingle-stranded polynucleotides. In some embodiments, the sequencevariant is a single nucleotide polymorphism. In some embodiments, thesample is a sample from a subject. In some embodiments, the samplecomprises urine, stool, blood, saliva, tissue, or bodily fluid. In someembodiments, the sample comprises tumor cells. In some embodiments, thesample comprises a formalin-fixed paraffin embedded (FFPE) sample. Insome embodiments, the plurality of polynucleotides comprises cell-freepolynucleotides. In some embodiments, the cell-free polynucleotidescomprise cell-free DNA. In some embodiments, the cell-freepolynucleotides comprise cell-free RNA. In some embodiments, thecell-free polynucleotides comprise circulating tumor DNA. In someembodiments, the cell-free polynucleotides comprise circulating tumorRNA.

In some embodiments, in (d), the method comprises calling a sequencedifference detected in the sequencing reads as the sequence variant whenthe sequence difference occurs in at least 50% of the sequencing readsfrom the first amplified polynucleotide and at least 50% of sequencingreads from the second amplified polynucleotide.

In an aspect, the present disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end, the method comprising (a) circularizing individualpolynucleotides of the plurality to form a plurality of circularpolynucleotides, wherein a given circular polynucleotide of theplurality has a junction sequence resulting from the circularization;(b) amplifying the circular polynucleotides of (a) to produce aplurality of amplified polynucleotides; (c) shearing the amplifiedpolynucleotides to produce sheared polynucleotides, each shearedpolynucleotide comprising one or more shear points at a 5′ end and/or a3′ end; (d) sequencing amplification products of the shearedpolynucleotides to produce a plurality of sequencing reads; and (e)calling a sequence difference detected in the sequencing reads as thesequence variant when the sequence difference occurs in sequencing readscorresponding to a first sheared polynucleotide and sequencing readscorresponding to a second sheared polynucleotide.

In some embodiments, circularizing individual polynucleotides in (a) iseffected by a ligase enzyme. In some embodiments, prior to (b), theligase enzyme is degraded. In some embodiments, uncircularizedpolynucleotides are degraded prior to (b). In some embodiments, theplurality of circular polynucleotides is not purified or isolated priorto (b).

In some embodiments, circularizing in (a) comprises the step of joiningan adapter polynucleotide to the 5′ end, the 3′ end, or both the 5′ endand the 3′ end of a polynucleotide in the plurality of polynucleotides.

In some embodiments, amplifying the circular polynucleotides in (b) iseffected by a polymerase having strand-displacement activity. In someembodiments, amplifying the circular polynucleotides in (b) comprisesrolling circle amplification (RCA). In some embodiments, amplifying in(b) comprises subjecting the circular polynucleotides to anamplification reaction mixture comprising random primers. In someembodiments, individual random primers comprise sequences at theirrespective 5′ and/or 3 ends distinct from each other. In someembodiments, amplifying in (b) comprises subjecting the circularpolynucleotides to an amplification reaction mixture comprising targetspecific primers.

In some embodiments, the amplification products of the shearedpolynucleotides are subjected to sequencing without enrichment. In someembodiments, the method further comprises enriching one or more targetpolynucleotides among the amplification products of the shearedpolynucleotides by performing an enrichment step prior to the sequencingof (d). In some embodiments, shearing in (c) comprises subjecting theamplified polynucleotides to sonication. In some embodiments, shearingin (c) comprises subjecting the amplified polynucleotides to enzymaticcleavage.

In some embodiments, the plurality of polynucleotides comprisessingle-stranded polynucleotides. In some embodiments, the sequencevariant is a single nucleotide polymorphism. In some embodiments, thesample is a sample from a subject. In some embodiments, the samplecomprises urine, stool, blood, saliva, tissue, or bodily fluid. In someembodiments, the sample comprises tumor cells. In some embodiments, thesample comprises a formalin-fixed paraffin embedded (FFPE) sample. Insome embodiments, the plurality of polynucleotides comprises cell-freepolynucleotides. In some embodiments, the cell-free polynucleotidescomprise cell-free DNA. In some embodiments, the cell-freepolynucleotides comprise cell-free RNA. In some embodiments, thecell-free polynucleotides comprise circulating tumor DNA. In someembodiments, the cell-free polynucleotides comprise circulating tumorRNA.

In some embodiments, in (e), the method comprises calling a sequencedifference detected in the sequencing reads as the sequence variant whenthe sequence difference occurs in at least 50% of the sequencing readsfrom the first sheared polynucleotide and at least 50% of the sequencingreads from the second sheared polynucleotide.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1 depicts the schematic of one embodiment of methods according tothe present disclosure. DNA strands are circularized and target specificprimers corresponding to genes under investigation are added, along withpolymerase, dNTPs, buffers, etc., such that rolling circle amplification(RCA) occurs to form concatamers (e.g. “multimers”) of the template DNA(e.g. a “monomer”). The concatamers are treated to synthesize thecorresponding complementary strand, and then adapters are added to makesequencing libraries. This resulting library, which is then sequencedusing standard technologies, will generally contain three species: nDNA(“normal” DNA) that does not contain a rare sequence variant (e.g. amutation); nDNA that contains enzymatic sequencing errors, and DNA thatcontains multimers of “real” or actual sequence variants that werepre-existing in the sample polynucleotides before amplification. Thepresence of multiple copies of the effectively rare mutation allows thedetection and identification of the sequence variant.

FIG. 2 depicts a similar strategy as FIG. 1 but with the addition ofadapters to facilitate polynucleotide circularization. FIG. 2 also showsthe use of target specific primers.

FIG. 3 is similar to FIG. 2 except adapter primers are used inamplification.

FIGS. 4A-4C depict three embodiments associated with the formation ofcircularized single-stranded (ss) DNA. At the top, single-stranded DNA(ssDNA) is circularized in the absence of adapters, while the middlescheme depicts the use of adapters, and the bottom scheme utilizes twoadapter oligos (yielding different sequences on each end) and mayfurther include a splint oligo that hybridizes to both adapters to bringthe two ends in proximity.

FIG. 5 depicts an embodiment for circularizing specific targets throughthe use of a “molecular clamp” to bring the two ends of the singlestranded DNA into spatial proximity for ligation.

FIGS. 6A and 6B depict two schemes for the addition of adapters usingblocked ends of the nucleic acids.

FIGS. 7A-7C depict three different ways to prime a rolling circleamplification (RCA) reaction. FIG. 7A shows the use of target specificprimers, e.g. the particular target genes or target sequences ofinterest. This generally results in only target sequences beingamplified. FIG. 7B depicts the use of random primers to perform wholegenome amplification (WGA), which will generally amplify all samplesequences, which then are bioinformatically sorted out duringprocessing. FIG. 7C depicts the use of adapter primers when adapters areused, also resulting in general non-target-specific amplification

FIG. 8 depicts an example of double stranded DNA circularization andamplification, such that both strands are amplified, in accordance withan embodiment.

FIGS. 9A-9D depict a variety of schemes to achieve complementary strandsynthesis for subsequent sequencing. FIG. 9A depicts the use of randompriming of the target strand, followed by ligation. FIG. 9B depicts theuse of adapter priming of the target strand, similarly followed byligation. FIG. 9C depicts the use of a “loop” adapter, wherein theadapter has two sections of sequences that are complementary, such thatthey hybridize with each other to create a loop (e.g. stem-loopstructures). Upon ligation to the end of the concatemer, the free end ofthe loop serves as the primer for the complementary strand. FIG. 9Dshows the use of hyper-branching random primers to achieve second strandsynthesis.

FIG. 10 shows a PCR method in accordance with an embodiment thatpromotes sequencing of circular polynucleotides or strands containing atleast two copies of a target nucleic acid sequence, using a pair ofprimers that are oriented away from one another when aligned within amonomer of the target sequence (also referred to as “back to back,” e.g.oriented in two directions but not on the ends of the domain to beamplified). In some embodiments, these primer sets are used afterconcatamers are formed to promote amplicons to be higher multimers, e.g.dimers, trimers, etc., of the target sequence. Optionally, the methodcan further include a size selection to remove amplicons that aresmaller than dimers.

FIGS. 11A-11D depicts an embodiment in which back to back (B2B) primersare used with a “touch up” PCR step, such that amplification of shortproducts (such as monomers) are less favored. In this case, the primershave two domains; a first domain that hybridizes to the target sequence(grey or black arrow) and a second domain that is a “universal primer”binding domain (bent rectangles; also sometimes referred to as anadapter) which does not hybridize to the original target sequence. Insome embodiments, the first rounds of PCR are done with a lowtemperature annealing step (FIG. 11A), such that gene specific sequencesbind. The low temperature run results in PCR products of variouslengths, including short products (FIG. 11B). After a low number ofrounds, the annealing temperature is raised, such that hybridization ofthe entire primer, both domains, is favored (FIG. 11C); as depictedthese are found at the ends of the templates, while internal binding isless stable. Shorter products are thus less favored at the highertemperature with both domains than at the lower temperature or only asingle domain (FIG. 11D).

FIGS. 12A and 12B depict two different methods of sequencing libraryconstruction. FIG. 12A illustrates an example of the Illumina® Nexterasample preparation system, by which DNA can be simultaneously fragmentedand tagged with sequencing adapters in a single step. In FIG. 12B,concatemers are fragmented by sonication, followed by adding adapters toboth ends (e.g. by use of kits by KAPA Biosystems), and PCRamplification. Other methods are available.

FIGS. 13A-C provide an illustration of example advantages ofback-to-back (B2B) primer design compared to traditional PCR primerdesign. Traditional PCR primer design (left) places the primers (arrows,A and B) in the region flanking a target sequence, which may be ahotspot for mutations (black stars), and they are typically at least 60base pairs (bp) apart, resulting a typical footprint of about 100 bp. Inthis illustration, the B2B primer design (right) places primers on oneside of the target sequence. The two B2B primers are facing to theopposite directions, any may overlap (e.g. about or less than about 12bp, 10 bp, 5 bp, or less). Depending on B2B primer lengths, the totalfootprint in this illustration can be between 28-50 bp. Due to thelarger footprint, fragmentation events are more likely to disrupt primerbinding in the traditional design, leading to loss of sequenceinformation, whether for linear fragments (FIG. 13A), circularized DNA(FIG. 13B), or amplification products (FIG. 13C). Moreover, asillustrated in FIG. 13C, the B2B primer design captures junctionsequences (also referred to as a “natural barcode”) which can be used todistinguish different polynucleotides.

FIG. 14 Illustrates a method for generating templates for detectingsequence variants, in accordance with an embodiment (e.g. an exampleimplementation of a process using circularized polynucleotides). DNAinput is denatured into ssDNA, circularized by ligation, andnon-circularized DNA is degraded by exonuclease digestion. Ligationefficiency is quantified by quantitative PCR (qPCR), comparing input DNAand circularized DNA amounts, typically yielding a ligation efficiencyof at least about 80%. Circularized DNA is purified to exchange buffer,followed by whole genome amplification (WGA) with random primers andPhi29 polymerase. WGA products are purified, and products are fragmented(e.g. by sonication) into short fragments of about or less than about400 bp. The on-target rate of amplified DNA is quantified by qPCRcomparing the same amount of reference genome DNA to amplified DNA,typically showing an average on-target rate of about or more than about95%.

FIGS. 15A-15C illustrates a further implementation of amplification withtailed B2B primers, and implementing a “touch up” second phase of PCR ata higher temperature. The B2B primers contain a sequence-specific region(thick black line) and an adapter sequence (open box). With a lower,phase-one annealing temperature, the target-specific sequence anneals tothe template to yield an initial monomer, and PCR products containtandem repeats (FIG. 15A). In a second amplification phase at a highertemperature, both target-specific and adapter sequence hybridization isfavored over target-specific sequence hybridizing alone, decreasing thedegree to which short products are preferentially produced (FIG. 15B).Without favoring the whole primer, internal annealing with thetarget-specific sequences rapidly increases the fraction of monomers(FIG. 15C, left).

FIG. 16 illustrates a comparison between the background noise (frequencyof variants) detected by target sequencing methods using a Q30 filterwith (bottom line) and without (top line) requiring that a sequencedifference occur on two different polynucleotides (e.g. identified bydifferent junctions) to be counted as a variant. Human genomic DNA(12878, Coriell Institute) was fragmented to 100-200 bp, and included a2% spike-in of genomic DNA (19240, Coriell Institute) containing a knownSNP (CYP2C19). The true variant signal (marked peak) was notsignificantly above background (top, light grey plot). Background noisewas decreased to about 0.1 by applying the validation filter (lower,black plot).

FIG. 17 illustrates detection of sequence variants spiked in at variouslow frequencies in the population of polynucleotides (2%, 0.2%, and0.02%), which are nonetheless significantly above background, whenapplying a method of the disclosure.

FIGS. 18A and 18B illustrate results of an analysis of ligationefficiency and on-target rate of an embodiment of the disclosure.

FIG. 19 illustrates the preservation of allele frequencies, andsubstantial absence of bias, in a method in accordance with anembodiment of the disclosure.

FIG. 20 illustrates results for detection of sequence variants in asmall input sample, in accordance with an embodiment.

FIG. 21 illustrates an example of high background in results fordetection of sequence variants obtained without requiring that asequence difference occur on two different polynucleotides, according tostandard sequencing methods.

FIG. 22 provides graphs illustrating comparisons between GC contentdistributions of the genome and GC content distributions of sequencingresults produced in accordance with a method in accordance with anembodiment of the disclosure (methods disclosed herein; left),sequencing results using an alternative sequencing library constructionkit (Rubicon, Rubicon Genomics; middle), and cell-free DNA (cfDNA)generally as reported in the literature for 32 ng (right).

FIG. 23 provides a graph illustrating the size distribution of input DNAobtained from sequencing reads of a method in accordance with anembodiment.

FIG. 24 provides a graph illustrating uniform amplification acrossmultiple targets by a random-priming method in accordance with anembodiment.

FIGS. 25A and 25B illustrate embodiments for the formation ofpolynucleotide multimers having identifiable junctions, in the absenceof circularization. Polynucleotides (such as polynucleotide fragments,or cell-free DNA) are joined to form multimers having non-naturaljunctions useful in distinguishing independent polynucleotides inaccordance with embodiments of the disclosure (also referred to hereinas “auto-tag”). In FIG. 25A, polynucleotides are joined directly to oneanother by blunt-end ligation. In FIG. 25B, polynucleotides are joinedvia one or more intervening adapter oligonucleotides, which may furthercomprise a barcode sequence. Multimers are then subjected toamplification by any of a variety of methods, such as by random primers(whole genome amplification), adapter primers, or one or more targetspecific primers or primer pairs.

FIG. 26 illustrates an example variation on the process FIG. 25 .Polynucleotides (e.g. cfDNA, or other polynucleotide fragments) areend-repaired, A-tailed, and adapter ligated (e.g. using a standard kit,such as kits by KAPA Biosystems). Carrier DNA labeled with internaluracil (U) can be supplemented to raise total DNA input to desiredlevels (e.g. to about or more than about 20 ng). A sequence variant tobe detected is indicated by a “star”. When ligation is complete, carrierDNA can be degraded by addition of Uracil-Specific Excision Reagent(USER) enzyme, which is a mixture of Uracil DNA glycosylase (UDG) andthe DNA glycosylase-lyase Endonuclease VIII. Products are purified toeliminate fragments of carrier DNA. Purified products are amplified(e.g. by PCR, using primers directed to adapter sequences). Any residualcarrier DNA is not likely to be amplified due to degradation, andseparation from an adapter on at least one end. Amplified products canbe purified to remove short DNA fragments.

FIGS. 27A-27E illustrates an example variation on the process of FIG. 25. Target specific amplification primers comprise a common 5′ “tail” thatfunctions as an adapter (grey arrow). Initial amplification (e.g. byPCR) proceeds for a few cycles (e.g. at least about 5, 10, or morecycles). PCR products can serve as primers as well, annealing to otherPCR products (e.g. when annealing temperature is reduced in a secondphase) to produce concatemers having identifiable junctions. The secondphase can comprise a number of cycles (e.g. 5, 10, 15, 20, or morecycles), and may include a selection or variation of conditions thatfavor concatemer formation and amplification. Methods according to thisschematic are also referred to as “Relay Amp Seq”, which may findparticular use in a compartmentalized setting (e.g. in a droplet).

FIGS. 28A-28E illustrate non-limiting examples of methods forcircularizing polynucleotides. In FIG. 28A, double-strandedpolynucleotides (e.g. dsDNA) are denatured into single-strands, followedby direct circularization (e.g. self-joining ligation by CircLigase). InFIG. 28B, polynucleotides (e.g. DNA fragments) are end-repaired andA-tailed (adding single-based extension of adenosine to 3′ ends) toimprove ligation efficiency, followed by denaturation to single-strands,and circularization. In FIG. 28C, polynucleotides are end-repaired andA-tailed (if double-stranded), joined to adapters having a thymidine (T)extension, denatured into single-strands, and circularized. In FIG. 28D,polynucleotides are end-repaired and A-tailed (if double-stranded), bothends are ligated to an adapter having three elements (T extension forligation, complementarity between adapters, and a 3′ tail), strands aredenatured, and single-stranded polynucleotides are circularized(facilitated by complementarity between the adapter sequences). In FIG.28E, double-stranded polynucleotides are denatured to single-strandedform, and circularized in the presence of a molecular clamp that bringsthe ends of the polynucleotide closer together to facilitate joining.

FIG. 29 illustrates an example workflow design of an amplificationsystem for identifying sequence variants in accordance with methods ofthe disclosure, particularly with regard to circularizedpolynucleotides.

FIG. 30 illustrates an example workflow design of an amplificationsystem for identifying sequence variants in accordance with methods ofthe disclosure, particularly with regard to linear polynucleotide inputswithout a circularization step.

FIG. 31 provides a summary illustration of example workflows foridentifying sequence variants in accordance with methods of thedisclosure. Along the “linear polynucleotide analysis” (upper) branch,analysis may include digital PCR (e.g. digital droplet PCR, ddPCR),real-time PCR, enrichment by probe capture (capture seq) with analysisof junction sequences (auto tag), sequencing based on inserted adaptersequences (barcoded insertion), or Relay Amp sequencing. Along the“circularized polynucleotide analysis” (lower) branch, analysis mayinclude digital PCR (e.g. digital droplet PCR, ddPCR), real-time PCR,enrichment by probe capture (capture seq) with analysis of junctionsequences (natural barcode), enrichment by probe capture or targetedamplification (e.g. B2B amplification), and sequence analysis with avalidation step of identifying a sequence variant as a differenceoccurring in two different polynucleotides (e.g. polynucleotides havingdifferent junctions).

FIG. 32 is an illustration of a system according to an embodiment.

FIG. 33 illustrates efficiency of the capture and the coverage along thetarget regions according to an example. >90% of the targeting bases arecovered more than 20×, and >50% of the targeted bases have >50×coverage.

FIGS. 34A and 34B illustrate exemplary single reaction assay workflows.

FIG. 35 illustrates a sequence variant call using junction information.

FIGS. 36A-36H illustrate steps of various embodiments of the invention.

FIGS. 37A and 37B illustrate embodiments where target polynucleotidesinclude single-stranded polynucleotides.

FIG. 38 shows library complexity from various workflows describedherein.

FIG. 39 illustrates an exemplary schematic in which concatemer shreddingpoints and junction sequences are used to uniquely index reads amplifiedfrom the same origins.

FIG. 40 illustrates an exemplary schematic in which concatemer 5′ and 3′sequences resulting from random priming during amplification andjunction sequences are used to uniquely index reads amplified from thesame origins.

FIGS. 41A-41C illustrate an exemplary schematic in which junctionsequences and 5′/3′ ends of amplified polynucleotides are used togenerate read families. In FIG. 41A, the group is counted as ‘variant’because all reads in the group show the variant (‘x’) with concatemerconfirmation. In FIG. 41B, the variant is rejected and the consensus ofthe read family is classified as wild-type because a majority of thereads in the family do not show the variant (‘x’). In FIG. 41C, avariant is called when the same sequence difference (‘x’) is detected inat least two different read families. A sequence difference (circle)that is not detected in at least two different read families is notcalled a variant.

DETAILED DESCRIPTION

The practice of some embodiments disclosed herein employ, unlessotherwise indicated, conventional techniques of immunology,biochemistry, chemistry, molecular biology, microbiology, cell biology,genomics and recombinant DNA, which are within the skill of the art. Seefor example Sambrook and Green, Molecular Cloning: A Laboratory Manual,4th Edition (2012); the series Current Protocols in Molecular Biology(F. M. Ausubel, et al. eds.); the series Methods In Enzymology (AcademicPress, Inc.), PCR 2: A Practical Approach (M. J. MacPherson, B. D. Hamesand G. R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies,A Laboratory Manual, and Culture of Animal Cells: A Manual of BasicTechnique and Specialized Applications, 6th Edition (R. I. Freshney, ed.(2010)).

The term “about” or “approximately” means within an acceptable errorrange for the particular value as determined by one of ordinary skill inthe art, which will depend in part on how the value is measured ordetermined, i.e., the limitations of the measurement system. Forexample, “about” can mean within 1 or more than 1 standard deviation,per the practice in the art. Alternatively, “about” can mean a range ofup to 20%, up to 10%, up to 5%, or up to 1% of a given value.Alternatively, particularly with respect to biological systems orprocesses, the term can mean within an order of magnitude, preferablywithin 5-fold, and more preferably within 2-fold, of a value. Whereparticular values are described in the application and claims, unlessotherwise stated the term “about” meaning within an acceptable errorrange for the particular value should be assumed.

The terms “polynucleotide”, “nucleotide”, “nucleotide sequence”,“nucleic acid” and “oligonucleotide” are used interchangeably. Theyrefer to a polymeric form of nucleotides of any length, eitherdeoxyribonucleotides or ribonucleotides, or analogs thereof.Polynucleotides may have any three dimensional structure, and mayperform any function, known or unknown. The following are non-limitingexamples of polynucleotides: coding or non-coding regions of a gene orgene fragment, loci (locus) defined from linkage analysis, exons,introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA(rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA),micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides,branched polynucleotides, plasmids, vectors, isolated DNA of anysequence, isolated RNA of any sequence, nucleic acid probes, andprimers. A polynucleotide may comprise one or more modified nucleotides,such as methylated nucleotides and nucleotide analogs. If present,modifications to the nucleotide structure may be imparted before orafter assembly of the polymer. The sequence of nucleotides may beinterrupted by non-nucleotide components. A polynucleotide may befurther modified after polymerization, such as by conjugation with alabeling component.

In general, the term “target polynucleotide” refers to a nucleic acidmolecule or polynucleotide in a starting population of nucleic acidmolecules having a target sequence whose presence, amount, and/ornucleotide sequence, or changes in one or more of these, are desired tobe determined. In general, the term “target sequence” refers to anucleic acid sequence on a single strand of nucleic acid. The targetsequence may be a portion of a gene, a regulatory sequence, genomic DNA,cDNA, RNA including mRNA, miRNA, rRNA, or others. The target sequencemay be a target sequence from a sample or a secondary target such as aproduct of an amplification reaction.

“Hybridization” refers to a reaction in which one or morepolynucleotides react to form a complex that is stabilized via hydrogenbonding between the bases of the nucleotide residues. The hydrogenbonding may occur by Watson Crick base pairing, Hoogstein binding, or inany other sequence specific manner according to base complementarity.The complex may comprise two strands forming a duplex structure, threeor more strands forming a multi stranded complex, a singleself-hybridizing strand, or any combination of these. A hybridizationreaction may constitute a step in a more extensive process, such as theinitiation of PCR, or the enzymatic cleavage of a polynucleotide by anendonuclease. A second sequence that is complementary to a firstsequence is referred to as the “complement” of the first sequence. Theterm “hybridizable” as applied to a polynucleotide refers to the abilityof the polynucleotide to form a complex that is stabilized via hydrogenbonding between the bases of the nucleotide residues in a hybridizationreaction.

“Complementarity” refers to the ability of a nucleic acid to formhydrogen bond(s) with another nucleic acid sequence by eithertraditional Watson-Crick or other non-traditional types. A percentcomplementarity indicates the percentage of residues in a nucleic acidmolecule which can form hydrogen bonds (e.g., Watson-Crick base pairing)with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10being 50%, 60%, 70%, 80%, 90%, and 100% complementary, respectively).“Perfectly complementary” means that all the contiguous residues of anucleic acid sequence will hydrogen bond with the same number ofcontiguous residues in a second nucleic acid sequence. “Substantiallycomplementary” as used herein refers to a degree of complementarity thatis at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refersto two nucleic acids that hybridize under stringent conditions. Sequenceidentity, such as for the purpose of assessing percent complementarity,may be measured by any suitable alignment algorithm, including but notlimited to the Needleman-Wunsch algorithm (see e.g. the EMBOSS Needlealigner available atwww.ebi.ac.uk/Tools/psa/emboss_needle/nucleotide.html, optionally withdefault settings), the BLAST algorithm (see e.g. the BLAST alignmenttool available at blast.ncbi.nlm.nih.gov/Blast.cgi, optionally withdefault settings), or the Smith-Waterman algorithm (see e.g. the EMBOSSWater aligner available at www.ebi.ac.uk/Tools/psa/embosswater/nucleotide.html, optionally with default settings). Optimalalignment may be assessed using any suitable parameters of a chosenalgorithm, including default parameters.

In general, “stringent conditions” for hybridization refer to conditionsunder which a nucleic acid having complementarity to a target sequencepredominantly hybridizes with a target sequence, and substantially doesnot hybridize to non-target sequences. Stringent conditions aregenerally sequence-dependent, and vary depending on a number of factors.In general, the longer the sequence, the higher the temperature at whichthe sequence specifically hybridizes to its target sequence.Non-limiting examples of stringent conditions are described in detail inTijssen (1993), Laboratory Techniques In Biochemistry And MolecularBiology-Hybridization With Nucleic Acid Probes Part I, Second Chapter“Overview of principles of hybridization and the strategy of nucleicacid probe assay”, Elsevier, N.Y.

In one aspect, the disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end. In some cases, the method comprises (a) circularizingindividual polynucleotides of the plurality to form a plurality ofcircular polynucleotides, wherein a given circular polynucleotide of theplurality has a junction sequence resulting from said circularization;(b) amplifying the circularized polynucleotides of (a) to produce aplurality of amplified polynucleotides; (c) shearing the amplifiedpolynucleotides to produce sheared polynucleotides, each shearedpolynucleotide comprising one or more shear points at a 5′ end and/or a3′ end; (d) sequencing the sheared polynucleotides and/or amplificationproducts of the sheared polynucleotides to produce a plurality ofsequencing reads; and (d) calling a sequence difference detected in thesequencing reads as the sequence variant when the sequence differenceoccurs in sequencing reads corresponding to a first shearedpolynucleotide and a second sheared polynucleotide.

In some cases, the method comprises (a) circularizing individualpolynucleotides of said plurality to form a plurality of circularpolynucleotides, each of which having a junction between the 5′ end andthe 3′ end; (b) amplifying the circular polynucleotides of (a) toproduce amplified polynucleotides; (c) shearing the amplifiedpolynucleotides to produce sheared polynucleotides, each shearedpolynucleotide comprising one or more shear points at a 5′ end and/or 3′end; (d) sequencing the sheared polynucleotides to produce a pluralityof sequencing reads; (e) identifying sequencing differences betweensequencing reads and a reference sequence; and (f) calling a sequencedifference as the sequence variant when the sequence difference occursin at least two different sheared polynucleotides.

In general, joining ends of a polynucleotide to one-another to form acircular polynucleotide (either directly, or with one or moreintermediate adapter oligonucleotides) produces a junction having ajunction sequence. Where the 5′ end and 3′ end of a polynucleotide arejoined via an adapter polynucleotide, the term “junction” can refer to ajunction between the polynucleotide and the adapter (e.g. one of the 5′end junction or the 3′ end junction), or to the junction between the 5′end and the 3′ end of the polynucleotide as formed by and including theadapter polynucleotide. Where the 5′ end and the 3′ end of apolynucleotide are joined without an intervening adapter (e.g. the 5′end and 3′ end of a single-stranded DNA), the term “junction” refers tothe point at which these two ends are joined. A junction may beidentified by the sequence of nucleotides comprising the junction (alsoreferred to as the “junction sequence”).

In some embodiments, samples comprise polynucleotides having a mixtureof ends formed by natural degradation processes (such as cell lysis,cell death, and other processes by which polynucleotides such as DNA andRNA are released from a cell to its surrounding environment in which itmay be further degraded, e.g., cell-free polynucleotides, e.g.,cell-free DNA and cell-free RNA), fragmentation that is a byproduct ofsample processing (such as fixing, staining, and/or storage procedures),and fragmentation by methods that cleave DNA without restriction tospecific target sequences (e.g. mechanical fragmentation, such as bysonication; non-sequence specific nuclease treatment, such as DNase I,fragmentase). Where samples comprise polynucleotides having a mixture ofends, the likelihood of two polynucleotides having the same 5′ end or 3′end is low, and the likelihood that two polynucleotides willindependently have both the same 5′ end and 3′ end is lower.Accordingly, in some embodiments, junctions may be used to distinguishdifferent polynucleotides, even where the two polynucleotides comprise aportion having the same target sequence. Where polynucleotide ends arejoined without an intervening adapter, a junction sequence may beidentified by alignment to a reference sequence. For example, where theorder of two component sequences appears to be reversed with respect tothe reference sequence, the point at which the reversal appears to occurmay be an indication of a junction at that point. Where polynucleotideends are joined via one or more adapter sequences, a junction may beidentified by proximity to the known adapter sequence, or by alignmentas above if a sequencing read is of sufficient length to obtain sequencefrom both the 5′ and 3′ ends of the circularized polynucleotide. In someembodiments, the formation of a particular junction is a sufficientlyrare event such that it is unique among the circularized polynucleotidesof a sample.

In some embodiments, circularizing individual polynucleotides in (a) iseffected by subjected the plurality of polynucleotides to a ligationreaction. The ligation reaction may comprise a ligase enzyme. In someembodiments, the ligase enzyme is degraded prior to amplifying in (b).Degradation of ligase prior to amplifying in (b) can increase therecovery rate of amplifiable polynucleotides. In some embodiments, theplurality of circularized polynucleotides are not purified or isolatedprior to (b). In some embodiments, uncircularized, linearpolynucleotides are degraded prior to amplifying.

In some cases, circularizing in (a) comprises the step of joining andadapter polynucleotide to the 5′ end, the 3′ end, or both the 5′ end andthe 3′ end of a polynucleotide in the plurality of polynucleotides. Aspreviously described, where the 5′ end and/or 3′ end of a polynucleotideare joined via an adapter polynucleotide, the term “junction” can referto the junction between the polynucleotide and the adapter (e.g., one ofthe 5′ end junction or the 3′ end junction), or to the junction betweenthe 5′ end and the 3′ end of the polynucleotide as formed by andincluding the adapter polynucleotide.

The circularized polynucleotides can be amplified, for example, afterdegradation of the ligase enzyme, to yield amplified polynucleotides.Amplifying the circular polynucleotides in (b) can be effected by apolymerase having strand-displacement activity. In some cases, thepolymerase is a Phi29 DNA polymerase. In some cases, amplificationcomprises rolling circle amplification (RCA). The amplifiedpolynucleotides resulting from RCA can comprise linear concatemers, orpolynucleotides comprising two or more copies of a target sequence(e.g., subunit sequence) from a template polynucleotide. In someembodiments, amplifying comprises subjecting the circularpolynucleotides to an amplification reaction mixture comprising randomprimers. In some cases, amplifying comprises subjecting the circularpolynucleotides to an amplification reaction mixture comprising one ormore primers, each of which specifically hybridizes to a differenttarget sequence via sequence complementarity.

The amplified polynucleotides are sheared, in some cases, to producesheared polynucleotides that are shorter in length relative to theunsheared polynucleotides. Two or more sheared polynucleotidesoriginating from the same linear concatemer may have the same junctionsequence but can have different 5′ and/or 3′ ends (e.g., shear ends).

Amplified polynucleotides can be sheared using any variety of methods,such as, but not limited to, physical fragmentation, enzymatic methods,and chemical fragmentation. Non-limiting examples of physicalfragmentation methods that can be employed for the fragmentation ofamplified polynucleotides include acoustic shearing, sonication, andhydrodynamic shearing. In some cases, acoustic shearing and sonicationmay be preferred. Non-limiting examples of enzymatic fragmentationmethods that can be employed for the fragmentation of amplifiedpolynucleotides include use of enzymes such as DNase I and otherrestriction endonucleases, including non-specific nucleases, andtransposases. Non-limiting examples of chemical fragmentation methodsthat can be employed for the fragmentation of amplified polynucleotidesinclude use of heat and divalent metal cations.

Sheared polynucleotides (also referred to as fragmented polynucleotides)which are shorter in length compared to the unsheared polynucleotidesmay be desired to match the capabilities of the sequencing instrumentused for producing sequencing reads, also referred to as sequence reads.For example, amplified polynucleotides may be fragmented, for examplesheared, to the optimal length determined by the downstream sequencingplatform. Various sequencing instruments, further described herein, canaccommodate nucleic acids of different lengths. In some cases, amplifiedpolynucleotides are sheared in the process of attaching adaptors usefulin downstream sequencing platforms, for example in flow cell attachmentor sequencing primer binding. In some cases, sheared polynucleotides aresubject to amplification to produce amplification products of thesheared polynucleotides prior to sequencing. Additional amplificationcan be desirable, for example, to generate a sufficient amount ofpolynucleotides for downstream analysis, for example, sequencinganalysis. The resulting amplification products can comprise multiplecopies of individual sheared polynucleotides.

During sequencing, sheared polynucleotides or amplification productsthereof originating from the same amplified polynucleotide can besequenced. Sequencing reads resulting from sequencing can be groupedinto read families. A read family can comprise any suitable number ofsequence reads. In some cases, a read family comprises at least 5, 10,15, 20, 25, 50, 75, or 100 sequences reads. In some cases, a group ofsequence reads may not be identified as a read family unless a minimumnumber of sequence reads are present. For example, a read family cancomprise at least 2, 3, 4, 5, 7, 8, 9, or 10 sequence reads. In somecases, a read family comprises at least 25 read sequences. In somecases, sequence reads which may be classified as a read family based ona shared junction sequence and shared sequences of the 5′ and 3′ ends.In some embodiments, the sequence reads of a read family have the samejunction sequence. In some embodiments, the sequence reads of a readfamily have the same sequences at the 5′ and 3′ end, for example, thesequences may be identical over at least 5 bases, 6 bases, 7 bases, 8bases, 9 bases, or 10 bases at each of the 5′ and 3′ ends. In somecases, the sequences at the 5′ and 3′ ends are not identical amongst allsequence reads of a read family due to errors resulting fromamplification and/or sequencing error. The sequencing reads of a readfamily may exhibit overlap when compared, for example by alignment. Insome cases, the sequencing reads of a read family exhibit at least 75%identity, when optimally aligned. The term “percent (%) identity” refersto the percentage of identical residues shared between two sequences,e.g., a candidate sequence and a reference sequence, after aligning thesequences and introducing gaps, if necessary, to achieve the maximumpercent identity (i.e., gaps can be introduced in one or both of thecandidate and reference sequences for optimal alignment and, in somecases, non-homologous sequences can be disregarded for comparisonpurposes). Alignment, for purposes of determining percent identity, canbe achieved in various ways, for instance, using publicly availablecomputer software such as BLAST, ALIGN, or Megalign (DNASTAR) software.Percent identity of two sequences can be calculated by aligning a testsequence with a comparison sequence using BLAST, determining the numberof amino acids or nucleotides in the aligned test sequence that areidentical to amino acids or nucleotides in the same position of thecomparison sequence, and dividing the number of identical amino acids ornucleotides by the number of amino acids or nucleotides in thecomparison sequence. Two sequencing reads of a family can exhibit atleast 75% identity (e.g., at least 80%, 85%, 90%, or 95% identity) overany suitable length of bases, when optimally aligned. A first pair ofsequencing reads in a read family can exhibit a % identity that isdifferent from a second pair of sequencing reads. In some cases, the %identity is determined for an alignment over a length of at least 50bases (e.g., at least 60 bases, 70 bases, 80 bases, 90 bases, 100 bases,110 bases, 120 bases, 130 bases, 140 bases, or 150 bases). In somecases, the alignment is over a length of between about 25-250 bases,between about 50-200 bases, between about 75-175 bases, or between about100-150 bases. In some cases, the alignment is over the entire length ofthe test sequence or the comparison sequence. In some embodiments, twosequencing reads of a read family exhibit at least 75% identity (e.g.,at least 80%, 85%, 90%, or 95% identity) over a length of at least 50bases (e.g., at least 60 bases, 70 bases, 80 bases, 90 bases, 100 bases,110 bases, 120 bases, 130 bases, 140 bases, or 150 bases) when optimallyaligned.

Amplified polynucleotides comprising linear concatemers of a circularpolynucleotide template can comprise multiples repeats or copies of thecircular polynucleotide template sequence. Sheared polynucleotidesproduced from an amplified polynucleotide can have various copies of thecircular polynucleotide template sequence. A sheared polynucleotide canhave less than one copy of the repeat sequence, at least one copy of therepeat sequence, at least two copies of the repeat sequence, or at leastthree copies of the repeat sequence. The number of repeats in shearedpolynucleotides can depend on the length of the repeat sequence. Forexample, for sheared fragments of approximately the same size, aconcatemer having repeats of relatively shorter length can yield shearedfragments having more copies of the repeat sequence compared aconcatemer having repeats of longer length.

A sequencing read of a sheared polynucleotide or amplification productthereof can in some cases comprise at least one copy of the repeatsequence. In some cases, the sequencing read comprises at least twocopies of the repeat sequence (e.g., at least three copies, four copies,or five copies). The average number of copies of the repeat sequencefrom sequence reads of a read family can depend on the length of thepolynucleotides of the nucleic acid sample.

Sequencing reads can be grouped into read families by first identifyingthe length and/or sequence of the repeated segment in the concatemer,which corresponds to the sequence of the circular polynucleotidetemplate. In some cases, identifying the length and/or sequence of therepeated segment comprises alignment of reads to other reads oralignment to reference sequences. Next, the junction sequence can beidentified, for example by alignment to a reference sequence. Thesequences of the 5′ and 3′ ends of the polynucleotide and their relativedistances (e.g., in bases) from the junction can be determined. Readshaving the same junction sequence and shared sequences at the 5′ and 3′ends can be grouped into a read family, representing the sequencingreads of amplification products originating from the same shearedpolynucleotide.

A sequence difference observed in a read family can be called a truesequence difference as opposed to a result of amplification and/orsequencing error, in some cases, by confirming that the sequencedifference occurs in a second read family having the same junctionsequence but different sequences at respective 5′ and 3′ ends (e.g., atleast two sheared polynucleotides). Two read families having the samejunction sequence but different 5′ and/or 3′ ends can correspond to twosheared polynucleotides of the same linear concatemer. Observing thesequence difference in two read families corresponding to the twosheared polynucleotides of the same amplified polynucleotide can be oneway to confirm that the sequence difference is truly present on othercircular polynucleotide and not the result of amplification and/orsequencing error in one of the sheared polynucleotides.

In some cases, a sequence difference observed in sequence reads of aread family is considered a sequence difference if the sequencedifference occurs in a majority of the sequencing reads of the readfamily. In some cases, the sequence difference observed in sequencereads of the read family is considered a sequence difference if thesequence difference occurs in at least 50% of sequencing reads of theread family (e.g., at least 60%, 70%, 80%, 90%, or 95% of sequencingreads). In some cases, the sequence difference observed in sequencereads of the read family is considered a sequence difference if thesequence difference occurs in 100% of sequencing reads of the readfamily. In some cases, a sequence difference detected in the sequencingreads is called as the sequence variant when the sequence differenceoccurs in a majority of the sequencing reads from a first shearedpolynucleotide and a majority of sequencing reads from a second shearedpolynucleotide. In some cases, a sequence difference detected in thesequencing reads is called as the sequence variant when the sequencedifference occurs in at least 50% of the sequencing reads (e.g., atleast 60%, 70%, 80%, 90%, or 95% of sequencing reads) from the firstsheared polynucleotide and at least 50% of sequencing reads (e.g., atleast 60%, 70%, 80%, 90%, of 95% or sequencing reads) from the secondsheared polynucleotide. In some cases, a sequence difference detected inthe sequencing reads is called as the sequence variant when the sequencedifference occurs in 100% of the sequencing reads from the first shearedpolynucleotide and 100% of sequencing reads from the second shearedpolynucleotide.

By using two different sheared polynucleotides, that is two shearedpolynucleotides having the same junction sequence but different shearends, to confirm the presence of a sequence difference identified fromsequencing reads in a sample, sequence variant detection can beimproved. True sequence variants are expected to be found in at leasttwo sheared polynucleotides originating from the same amplifiedpolynucleotide whereas errors are expected to be found in less than twosheared polynucleotides. In some cases, the error rate of variantdetection is reduced. In some embodiments, the error rate of variantdetection is reduced by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%,45%, or 50%. In some cases, the sensitivity and/or specificity ofvariant detection is increased. In some embodiments, the sensitivity ofvariant detection is increased by at least 5%, 10%, 15%, 20%, 25%, 30%,35%, 40%, 45%, or 50%. In some embodiments, the specificity of variantdetection is increased by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%,40%, 45%, or 50%. In some cases, the false positive rate is decreased.

In some cases, calling the sequence different as the sequence variantoccurs further when (i) the sequence difference occurs in at least twocircular polynucleotides having different junctions; (ii) the sequencedifference is identified on both strands of a double-stranded inputmolecule; and/or (iii) the sequence difference occurs in a consensussequence for a concatemer formed by amplification comprising rollingcircle amplification (RCA). In some cases, the reference sequence is asequencing read. In some cases, the reference sequence is a consensussequence formed by aligning the sequencing reads with one another.

In some cases, the sheared polynucleotides are subjected to sequencingwithout enrichment. However, if desired, enriching one or more targetpolynucleotides among the amplified polynucleotides and/or shearedpolynucleotides can be performed in an enrichment step prior tosequencing. Exemplary enrichment steps may include the use of nucleicacids with sequence complementary to a target sequence.

The sequence variant, as described further herein, can be any variationwith respect to the reference sequence. Non-limiting examples ofsequence variants that can be detected using methods herein includesingle nucleotide polymorphisms (SNP), deletion/insertion polymorphisms(DIP), cow number variants (CNV), short tandem repeats (STR), simplesequence repeats (SSR), variable number of tandem repeats (VNTR),amplified fragment length polymorphisms (AFLP), retrotransposon-basedinsertion polymorphisms, sequence specific amplified polymorphism, anddifferences in epigenetic marks that can be detected as sequencevariants (e.g. methylation differences). In some cases, the sequencevariant is a polymorphism, such as a single-nucleotide polymorphism. Insome cases, the sequence variant is a causal genetic variant. In somecases, the sequence variant is associated with a type or stage ofcancer.

The nucleic acid sample can be a sample from a subject. In some cases,the sample is from a human subject. In some cases, the sample comprisesurine, stool, blood, saliva, tissue, or bodily fluid from a subject,such as a human subject. In some cases, the sample comprises tumorcells. In some cases, the sample comprises a formalin-fixed paraffinembedded sample. In some cases, the plurality of polynucleotides of thesample comprises cell-free polynucleotides. The cell-freepolynucleotides may comprise cell-free DNA, and in some cases,circulating tumor DNA and/or circulating tumor RNA. The cell-freepolynucleotides may comprise cell-free RNA. In some embodiments, themethod further comprises diagnosing, and optionally treating, thesubject based on calling of the sequence variant. In some cases, amicrobial contaminant in a sample is identified based on calling of thesequence variant. In such cases, the sample can be from a subject butmay also be from anon-subject sample such as a soil sample or foodsample.

The plurality of polynucleotides can be single-stranded. In some cases,the polynucleotides are in double-stranded form and are treated, forexample by denaturation, to yield single-strands before proceeding withthe circularization. In some cases, double-stranded polynucleotides arecircularized to yield double-stranded circles and the double-strandedcircles are treated, for example by denaturation, to yieldsingle-stranded circles.

In another aspect, the disclosure provides a method of identifying asequence variant in a nucleic acid sample comprising a plurality ofpolynucleotides, each polynucleotide of the plurality having a 5′ endand a 3′ end. In some embodiments, the method comprises: (a)circularizing individual polynucleotides of the plurality to form aplurality of circular polynucleotides, wherein a given circularpolynucleotide has a junction sequence resulting from saidcircularization; (b) amplifying the circular polynucleotides of (a) toproduce a plurality of amplified polynucleotides, wherein a firstamplified polynucleotide of the plurality and a second amplifiedpolynucleotide of the plurality comprise the junction sequence butcomprise different sequences at their respective 5′ and/or 3′ ends; (c)sequencing the plurality of amplified polynucleotides and/oramplification products thereof to produce a plurality of sequencingreads corresponding to the first amplified polynucleotide and the secondamplified polynucleotide; and (d) calling a sequence difference detectedin the sequencing reads as the sequence variant when the sequencedifference occurs in sequencing reads corresponding to both the firstamplified polynucleotide and the second amplified polynucleotide. Insome embodiments, circularizing individual polynucleotides in (a) iseffected by a ligase enzyme. In some embodiments, the ligase enzyme isdegraded prior to amplifying in (b). Degradation of ligase prior toamplifying in (b) can increase the recovery rate of amplifiablepolynucleotides. In some embodiments, the plurality of circularizedpolynucleotides is not purified or isolated prior to (b).

In some cases, circularizing in (a) comprises the step of joining anadapter polynucleotide to the 5′ end, the 3′ end, or both the 5′ end andthe 3′ end of a polynucleotide in the plurality of polynucleotides. Aspreviously described, where the 5′ end and/or 3′ end of a polynucleotideare joined via an adapter polynucleotide, the term “junction” can referto the junction between the polynucleotide and the adapter (e.g., one ofthe 5′ end junction or the 3′ end junction), or to the junction betweenthe 5′ end and the 3′ end of the polynucleotide as formed by andincluding the adapter polynucleotide.

Following circularization, the circular polynucleotides are amplified.Amplifying the circular polynucleotides in (b) can be effected by apolymerase having strand-displacement activity. In some cases, thepolymerase is a Phi29 DNA polymerase. In some cases, amplifying thecircular polynucleotides in (b) comprises rolling circle amplification(RCA). Rolling circle amplification can result in amplificationpolynucleotides comprising linear concatemers of the template circularpolynucleotide sequence. In some cases, amplifying in (b) comprisessubjecting the circular polynucleotides to an amplification reactionmixture using random primers. Random primers which can non-specifically(e.g., randomly) hybridize to the circular polynucleotides during theamplifying of (b). Random primers which can non-specifically hybridizeto circular polynucleotides can hybridize to a common circularpolynucleotide, a plurality of circular polynucleotides, or both. Insome cases, two or more random primers hybridize to the same circularpolynucleotide (e.g., different regions of the same circularpolynucleotide) and yield amplified polynucleotides having repeats ofthe same target sequence (or subunit sequence). Amplifiedpolynucleotides of the same template (e.g., circular polynucleotide) canhave the same junction sequence. In some embodiments, individual randomprimers comprise sequences at their respective 5′ and/or 3′ endsdistinct from each other, and the resulting amplified polynucleotidescan have sequences at 5′ and/or 3′ ends distinct from each other.Amplified polynucleotides of the same template, in some cases, havedifferent 5′ and/or 3′ ends, depending on where the primer initiallybound and where nucleotide incorporation was terminated. In some cases,amplifying in (b) comprises subjecting the circular polynucleotides toan amplification reaction mixture comprising target specific primers.Target specific primers can refer to primers targeting particular genesequences, or in some cases refers to primers targeting adapterpolynucleotide sequences. Amplified polynucleotides resulting from theuse of target specific primers can share a common first end (e.g.,primer) and may not share a second end, depending on where nucleotideincorporation was terminated. Amplifying can comprise multiple cycles ofdenaturation, primer binding, and primer extension. In some cases, theamplified polynucleotides can be subjected to further amplification toyield amplification products of the amplified polynucleotides.Additional amplification can be desirable, for example, to generate asufficient amount of polynucleotides for downstream analysis, forexample, sequencing analysis. The resulting amplification products cancomprise multiple copies of individual amplified polynucleotides.

The amplified polynucleotides and/or amplification products thereof canbe subsequently sequenced to yield sequencing reads. In some cases, theamplified polynucleotides and/or amplification products are subjected tosequencing without enrichment. However, if desired, enriching one ormore target polynucleotides among the amplified polynucleotides and/oramplification products can be performed in an enrichment step prior tosequencing.

Sequencing reads can be grouped into read families. A read family cancomprise any suitable number of sequence reads. In some cases, a readfamily comprises at least 5, 10, 15, 20, 25, 50, 75, or 100 sequencereads. In some cases, a group of sequence reads may not be identified asa read family unless a minimum number of sequence reads are present. Forexample, a read family comprises at least 2, 3, 4, 5, 6, 7, 8, 9, or 10sequence reads. In some cases, a read family comprises at least 25 readsequences. In some embodiments, the sequence reads of a read family havethe same junction sequence. In some embodiments, the sequence reads of aread family have the same sequences at the 5′ and 3′ ends, for example,the sequences may be identical over at least 5 bases, 6 bases, 7 bases,8 bases, 9 bases, or 10 bases at each of the 5′ and 3′ ends. In somecases, the sequences at the 5′ and 3′ ends are not identical amongst allsequence reads of a read family due to errors resulting fromamplification and/or sequencing. The sequencing reads of a read familymay exhibit overlap when compared, for example by alignment. In somecases, the sequencing reads of a read family exhibit at least 75%identity, when optimally aligned. Two sequencing reads of a family canexhibit at least 75% identity (e.g., at least 80%, 85%, 90%, or 95%identity) over any suitable length of bases, when optimally aligned. Afirst pair of sequencing reads in a read family can exhibit a % identitythat is different from a second pair of sequencing reads in the readfamily. In some cases, the % identity is determined for an alignmentover a length of at least 50 bases (e.g., at least 60 bases, 70 bases,80 bases, 90 bases, 100 bases, 110 bases, 120 bases, 130 bases, 140bases, or 150 bases). In some cases, the alignment is over a length ofbetween about 25-250 bases, between about 50-200 bases, between about75-175 bases, or between about 100-150 bases. In some cases, thealignment is over the entire length of the test sequence or thecomparison sequence. In some embodiments, two sequencing reads of a readfamily exhibit at least 75% identity (e.g., at least 80%, 85%, 90%, or95% identity) over a length of at least 50 bases (e.g., at least 60bases, 70 bases, 80 bases, 90 bases, 100 bases, 110 bases, 120 bases,130 bases, 140 bases, or 150 bases) when optimally aligned.

Amplified polynucleotides comprising linear concatemers of a sharedcircular polynucleotide template can yield multiple linear concatemersof the same circular polynucleotide sequence but on multiple, individualmolecules. A sequencing read of an amplified polynucleotide oramplification product thereof can in some cases comprise at least onecopy of the repeat sequence. In some cases, the sequencing readcomprises at least two copies of the repeat sequence (e.g., at leastthree copies, four copies, or five copies). The average number of copiesof the repeat sequence from sequence reads of a read family can dependon the length of the polynucleotides of the nucleic acid sample. Forexample, a sample comprising relatively longer polynucleotides mayresult in concatemers with fewer repeats compared to a sample comprisingrelatively shorter polynucleotides if the concatemers are similar inlength.

Sequencing reads can be grouped into read families by first identifyingthe length and/or sequence of the repeated segment in the concatemer,which corresponds to the sequence of the circular polynucleotidetemplate. In some cases, identifying the length and/or sequence of therepeated segment comprises alignment of reads to other reads oralignment to reference sequences. Next, the junction sequence can beidentified, for example by alignment to a reference sequence. Thesequences of the 5′ and 3′ ends of the polynucleotide and their relativedistances (e.g., in bases) from the junction can be determined. Readshaving the same junction sequence and shared sequences at the 5′ and 3′ends can be grouped into a read family, representing the sequencingreads of amplification products originating from the same amplifiedpolynucleotide, or the same molecular copy of the circularpolynucleotide.

A sequence difference observed in a read family can be called a truesequence difference as opposed to a result of amplification and/orsequencing error, in some cases, by confirming that the sequencedifference occurs in a second read family having the same junctionsequence but different sequences at respective 5′ and 3′ ends. Two readfamilies having the same junction sequence but different 5′ and/or 3′ends can correspond to two amplified polynucleotides of the samecircular polynucleotide. Observing the sequence difference in two readfamilies corresponding to the same circular polynucleotide can be oneway to confirm that the sequence difference is truly present on thecircular polynucleotide and not the result of amplification and/orsequencing error in one of the amplified polynucleotides.

In some cases, a sequence difference observed in sequence reads of aread family is considered a sequence difference if the sequencedifference occurs in a majority of the sequencing reads of the readfamily. In some cases, the sequence difference observed in sequencereads of the read family is considered a sequence difference if thesequence difference occurs in at least 50% of sequencing reads of theread family (e.g., at least 60%, 70%, 80%, 90%, or 95% of sequencingreads). In some cases, the sequence difference observed in sequencereads of the read family is considered a sequence difference if thesequence difference occurs in 100% of sequencing reads of the readfamily. In some cases, a sequence difference detected in the sequencingreads is called as the sequence variant when the sequence differenceoccurs in a majority of the sequencing reads from a first amplifiedpolynucleotide and a majority of sequencing reads from a secondamplified polynucleotide. In some cases, a sequence difference detectedin the sequencing reads is called as the sequence variant when thesequence difference occurs in at least 50% of the sequencing reads(e.g., at least 60%, 70%, 80%, 90%, or 95% of sequencing reads) from thefirst amplified polynucleotide and at least 50% of sequencing reads(e.g., at least 60%, 70%, 80%, 90%, or 95% of sequencing reads) from thesecond amplified polynucleotide. In some cases, a sequence differencedetected in the sequencing reads is called as the sequence variant whenthe sequence difference occurs in 100% of the sequencing reads from thefirst amplified polynucleotide and 100% of sequencing reads from thesecond amplified polynucleotide.

In practicing the methods described herein, variant detection in asample comprising a plurality of polynucleotides can be improved. Insome cases, the error rate of variant detection is reduced. In someembodiments, the error rate of variant detection is reduced by at least5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%. In some cases, thesensitivity and/or specificity of variant detection is increased. Insome embodiments, the sensitivity of variant detection is increased byat least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%. In someembodiments, the specificity of variant detection is increased by atleast 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, or 50%. In some cases,the false positive rate is decreased.

The sequence variant, as described further herein, can be any variationwith respect to the reference sequence. Non-limiting examples ofsequence variants that can be detected using methods herein includesingle nucleotide polymorphisms (SNP), deletion/insertion polymorphisms(DIP), cow number variants (CNV), short tandem repeats (STR), simplesequence repeats (SSR), variable number of tandem repeats (VNTR),amplified fragment length polymorphisms (AFLP), retrotransposon-basedinsertion polymorphisms, sequence specific amplified polymorphism, anddifferences in epigenetic marks that can be detected as sequencevariants (e.g. methylation differences). In some cases, the sequencevariant is a polymorphism, such as a single-nucleotide polymorphism.

The nucleic acid sample can be a sample from a subject. In some cases,the sample is from a human subject. In some cases, the sample comprisesurine, stool, blood, saliva, tissue, or bodily fluid from a subject,such as a human subject. In some cases, the sample comprises tumorcells. In some cases, the sample comprises a formalin-fixed paraffinembedded sample. In some cases, the plurality of polynucleotides of thesample comprises cell-free polynucleotides. The cell-freepolynucleotides may comprise cell-free DNA, and in some cases,circulating tumor DNA. The cell-free polynucleotides may comprisecell-free RNA, and in some cases, circulating tumor RNA.

As previously described, the plurality of polynucleotides can besingle-stranded. In some cases, the polynucleotides are indouble-stranded form and are treated, for example by denaturation, toyield single-strands before proceeding with the circularization. In somecases, double-stranded polynucleotides are circularized to yielddouble-stranded circles and the double-stranded circles are treated, forexample by denaturation, to yield single-stranded circles.

In another aspect, the disclosure provides a method of performingrolling circle amplification, such as in a nucleic acid samplecomprising a plurality of polynucleotides. In some embodiments, eachpolynucleotide of the plurality has a 5′ end and a 3′ end, and themethod comprises: (a) circularizing individual polynucleotides of theplurality to forma plurality of circular polynucleotides using a ligaseenzyme, each polynucleotide having a junction between the 5′ end and 3′end; (b) degrading the ligase enzyme; and (c) amplifying the circularpolynucleotides of (a) after degrading the ligase enzyme, whereinpolynucleotides are not purified or isolated between steps (a) and (c).In some embodiments, the method comprises additional steps of (d)sequencing the amplified polynucleotides to produce a plurality ofsequencing reads; (e) identifying sequence differences betweensequencing reads and a reference sequence; and (f) calling a sequencedifference that occurs in at least two circular polynucleotides havingdifferent junctions as the sequence variant. In some embodiments, themethod comprises identifying sequence differences between sequencingreads and a reference sequence, and calling a sequence difference thatoccurs in at least two circular polynucleotides having differentjunctions as the sequence variant, wherein: (a) the sequencing readscorrespond to amplification products of the at least two circularpolynucleotides; and (b) each of the at least two circularpolynucleotides comprises a different junction formed by ligating a 5′end and 3′ end of the respective polynucleotides.

In another aspect, the disclosure provides a method of performingrolling circle amplification, such as in a nucleic acid samplecomprising a plurality of polynucleotides. In some embodiments, eachpolynucleotide of the plurality has a 5′ end and a 3′ end, and themethod comprises: (a) circularizing individual polynucleotides of theplurality using a ligase enzyme to form a plurality of circularpolynucleotides, each polynucleotide having a junction between the 5′end and 3′ end; (b) degrading the ligase enzyme; (c) amplifying thecircular polynucleotides of (a) after degrading the ligase enzyme toproduce amplified polynucleotides, wherein polynucleotides are notpurified or isolated between steps (a) and (c); (d) shearing theamplified polynucleotides to produce sheared polynucleotides, eachsheared polynucleotide comprising one or more shear points at a 5′ endand/or a 3′ end. In some embodiments, the method comprises additionalsteps of (e) sequencing the sheared polynucleotides to produce aplurality of sequencing reads; (f) identifying sequence differencesbetween sequencing reads and a reference sequence; and (g) calling asequence difference as the sequence variant when the sequence differenceoccurs in at least two different sheared polynucleotides. Degradation ofligase prior to amplifying in (c) can increase the recovery rate ofamplifiable polynucleotides.

In some embodiments, the method comprises identifying sequencedifferences between sequencing reads and a reference sequence, andcalling a sequence difference that occurs in at least two circularpolynucleotides having different junctions as the sequence variant,wherein: (a) the sequencing reads correspond to amplification productsof the at least two circular polynucleotides; and (b) each of the atleast two circular polynucleotides comprises a different junction formedby ligating a 5′ end and 3′ end of the respective polynucleotides. Insome embodiments, the method comprises calling the sequence differenceas the sequence variant occurs further when (i) the sequence differenceoccurs in at least two circular polynucleotides having differentjunctions; (ii) the sequence difference is identified on both strands ofa double-stranded input molecule; and/or (iii) the sequence differenceoccurs in a consensus sequence for a concatemer formed by amplificationcomprising rolling circle amplification.

In general, the term “sequence variant” refers to any variation insequence relative to one or more reference sequences. Typically, thesequence variant occurs with a lower frequency than the referencesequence for a given population of individuals for whom the referencesequence is known. For example, a particular bacterial genus may have aconsensus reference sequence for the 16S rRNA gene, but individualspecies within that genus may have one or more sequence variants withinthe gene (or a portion thereof) that are useful in identifying thatspecies in a population of bacteria. As a further example, sequences formultiple individuals of the same species (or multiple sequencing readsfor the same individual) may produce a consensus sequence when optimallyaligned, and sequence variants with respect to that consensus may beused to identify mutants in the population indicative of dangerouscontamination. In general, a “consensus sequence” refers to a nucleotidesequence that reflects the most common choice of base at each positionin the sequence where the series of related nucleic acids has beensubjected to intensive mathematical and/or sequence analysis, such asoptimal sequence alignment according to any of a variety of sequencealignment algorithms. A variety of alignment algorithms are available,some of which are described herein. In some embodiments, the referencesequence is a single known reference sequence, such as the genomicsequence of a single individual. In some embodiments, the referencesequence is a consensus sequence formed by aligning multiple knownsequences, such as the genomic sequence of multiple individuals servingas a reference population, or multiple sequencing reads ofpolynucleotides from the same individual. In some embodiments, thereference sequence is a consensus sequence formed by optimally aligningthe sequences from a sample under analysis, such that a sequence variantrepresents a variation relative to corresponding sequences in the samesample. In some embodiments, the sequence variant occurs with a lowfrequency in the population (also referred to as a “rare” sequencevariant). For example, the sequence variant may occur with a frequencyof about or less than about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%,0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001%,or lower. In some embodiments, the sequence variant occurs with afrequency of about or less than about 0.1%.

A sequence variant can be any variation with respect to a referencesequence. A sequence variation may consist of a change in, insertion of,or deletion of a single nucleotide, or of a plurality of nucleotides(e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotides). Where a sequencevariant comprises two or more nucleotide differences, the nucleotidesthat are different may be contiguous with one another, or discontinuous.Non-limiting examples of types of sequence variants include singlenucleotide polymorphisms (SNP), deletion/insertion polymorphisms (DIP),copy number variants (CNV), short tandem repeats (STR), simple sequencerepeats (SSR), variable number of tandem repeats (VNTR), amplifiedfragment length polymorphisms (AFLP), retrotransposon-based insertionpolymorphisms, sequence specific amplified polymorphism, and differencesin epigenetic marks that can be detected as sequence variants (e.g.methylation differences).

Nucleic acid samples that may be subjected to methods described hereincan be derived from any suitable source. In some embodiments, thesamples used are environmental samples. Environmental sample may be fromany environmental source, for example, naturally occurring or artificialatmosphere, water systems, soil, or any other sample of interest. Insome embodiments, the environmental samples may be obtained from, forexample, atmospheric pathogen collection systems, sub-surface sediments,groundwater, ancient water deep within the ground, plant root-soilinterface of grassland, coastal water and sewage treatment plants.

Polynucleotides from a sample may be any of a variety ofpolynucleotides, including but not limited to, DNA, RNA, ribosomal RNA(rRNA), transfer RNA (tRNA), micro RNA (miRNA), messenger RNA (mRNA),fragments of any of these, or combinations of any two or more of these.In some embodiments, samples comprise DNA. In some embodiments, samplescomprise genomic DNA. In some embodiments, samples comprisemitochondrial DNA, chloroplast DNA, plasmid DNA, bacterial artificialchromosomes, yeast artificial chromosomes, oligonucleotide tags, orcombinations thereof. In some embodiments, the samples comprise DNAgenerated by amplification, such as by primer extension reactions usingany suitable combination of primers and a DNA polymerase, including butnot limited to polymerase chain reaction (PCR), reverse transcription,and combinations thereof. Where the template for the primer extensionreaction is RNA, the product of reverse transcription is referred to ascomplementary DNA (cDNA). Primers useful in primer extension reactionscan comprise sequences specific to one or more targets, randomsequences, partially random sequences, and combinations thereof. Ingeneral, sample polynucleotides comprise any polynucleotide present in asample, which may or may not include target polynucleotides. Thepolynucleotides may be single-stranded, double-stranded, or acombination of these. In some embodiments, polynucleotides subjected toa method of the disclosure are single-stranded polynucleotides, whichmay or may not be in the presence of double-stranded polynucleotides. Insome embodiments, the polynucleotides are single-stranded DNA.Single-stranded DNA (ssDNA) may be ssDNA that is isolated in asingle-stranded form, or DNA that is isolated in double-stranded formand subsequently made single-stranded for the purpose of one or moresteps in a method of the disclosure.

In some embodiments, polynucleotides are subjected to subsequent steps(e.g. circularization and amplification) without an extraction step,and/or without a purification step. For example, a fluid sample may betreated to remove cells without an extraction step to produce a purifiedliquid sample and a cell sample, followed by isolation of DNA from thepurified fluid sample. A variety of procedures for isolation ofpolynucleotides are available, such as by precipitation or non-specificbinding to a substrate followed by washing the substrate to releasebound polynucleotides. Where polynucleotides are isolated from a samplewithout a cellular extraction step, polynucleotides will largely beextracellular or “cell-free” polynucleotides, such as cell-free DNA andcell-free RNA, which may correspond to dead or damaged cells. Theidentity of such cells may be used to characterize the cells orpopulation of cells from which they are derived, such as tumor cells(e.g. in cancer detection), fetal cells (e.g. in prenatal diagnostic),cells from transplanted tissue (e.g. in early detection of transplantfailure), or members of a microbial community.

If a sample is treated to extract polynucleotides, such as from cells ina sample, a variety of extraction methods are available. For example,nucleic acids can be purified by organic extraction with phenol,phenol/chloroform/isoamyl alcohol, or similar formulations, includingTRIzol and TriReagent. Other non-limiting examples of extractiontechniques include: (1) organic extraction followed by ethanolprecipitation, e.g., using a phenol/chloroform organic reagent (Ausubelet al., 1993), with or without the use of an automated nucleic acidextractor, e.g., the Model 341 DNA Extractor available from AppliedBiosystems (Foster City, Calif); (2) stationary phase adsorption methods(U.S. Pat. No. 5,234,809; Walsh et al., 1991); and (3) salt-inducednucleic acid precipitation methods (Miller et al., (1988), suchprecipitation methods being typically referred to as “salting-out”methods. Another example of nucleic acid isolation and/or purificationincludes the use of magnetic particles to which nucleic acids canspecifically or non-specifically bind, followed by isolation of thebeads using a magnet, and washing and eluting the nucleic acids from thebeads (see e.g. U.S. Pat. No. 5,705,628). In some embodiments, the aboveisolation methods may be preceded by an enzyme digestion step to helpeliminate unwanted protein from the sample, e.g., digestion withproteinase K, or other like proteases. See, e.g., U.S. Pat. No.7,001,724. If desired, RNase inhibitors may be added to the lysisbuffer. For certain cell or sample types, it may be desirable to add aprotein denaturation/digestion step to the protocol. Purificationmethods may be directed to isolate DNA, RNA, or both. When both DNA andRNA are isolated together during or subsequent to an extractionprocedure, further steps may be employed to purify one or bothseparately from the other. Sub-fractions of extracted nucleic acids canalso be generated, for example, purification by size, sequence, or otherphysical or chemical characteristic. In addition to an initial nucleicacid isolation step, purification of nucleic acids can be performedafter any step in the disclosed methods, such as to remove excess orunwanted reagents, reactants, or products. A variety of methods fordetermining the amount and/or purity of nucleic acids in a sample areavailable, such as by absorbance (e.g. absorbance of light at 260 nm,280 nm, and a ratio of these) and detection of a label (e.g. fluorescentdyes and intercalating agents, such as SYBR green, SYBR blue, DAPI,propidium iodine, Hoechst stain, SYBR gold, ethidium bromide).

Where desired, polynucleotides from a sample may be fragmented prior tofurther processing. Fragmentation may be accomplished by any of avariety of methods, including chemical, enzymatic, and mechanicalfragmentation. In some embodiments, the fragments have an average ormedian length from about 10 to about 1,000 nucleotides in length, suchas between 10-800, 10-500, 50-500, 90-200, or 50-150 nucleotides. Insome embodiments, the fragments have an average or median length ofabout or less than about 100, 200, 300, 500, 600, 800, 1000, or 1500nucleotides. In some embodiments, the fragments range from about 90-200nucleotides, and/or have an average length of about 150 nucleotides. Insome embodiments, the fragmentation is accomplished mechanicallycomprising subjecting sample polynucleotides to acoustic sonication. Insome embodiments, the fragmentation comprises treating the samplepolynucleotides with one or more enzymes under conditions suitable forthe one or more enzymes to generate double-stranded nucleic acid breaks.Examples of enzymes useful in the generation of polynucleotide fragmentsinclude sequence specific and non-sequence specific nucleases.Non-limiting examples of nucleases include DNase I, Fragmentase,restriction endonucleases, variants thereof, and combinations thereof.For example, digestion with DNase I can induce random double-strandedbreaks in DNA in the absence of Mg++ and in the presence of Mn++. Insome embodiments, fragmentation comprises treating the samplepolynucleotides with one or more restriction endonucleases.Fragmentation can produce fragments having 5′ overhangs, 3′ overhangs,blunt ends, or a combination thereof. In some embodiments, such as whenfragmentation comprises the use of one or more restrictionendonucleases, cleavage of sample polynucleotides leaves overhangshaving a predictable sequence. Fragmented polynucleotides may besubjected to a step of size selecting the fragments via standard methodssuch as column purification or isolation from an agarose gel.

According to some embodiments, polynucleotides among the plurality ofpolynucleotides from a sample are circularized. Circularization caninclude joining the 5′ end of a polynucleotide to the 3′ end of the samepolynucleotide, to the 3′ end of another polynucleotide in the sample,or to the 3′ end of a polynucleotide from a different source (e.g. anartificial polynucleotide, such as an oligonucleotide adapter). In someembodiments, the 5′ end of a polynucleotide is joined to the 3′ end ofthe same polynucleotide (also referred to as “self-joining”). In someembodiment, conditions of the circularization reaction are selected tofavor self-joining of polynucleotides within a particular range oflengths, so as to produce a population of circularized polynucleotidesof a particular average length. For example, circularization reactionconditions may be selected to favor self-joining of polynucleotidesshorter than about 5000, 2500, 1000, 750, 500, 400, 300, 200, 150, 100,50, or fewer nucleotides in length. In some embodiments, fragmentshaving lengths between 50-5000 nucleotides, 100-2500 nucleotides, or150-500 nucleotides are favored, such that the average length ofcircularized polynucleotides falls within the respective range. In someembodiments, 80% or more of the circularized fragments are between50-500 nucleotides in length, such as between 50-200 nucleotides inlength. Reaction conditions that may be optimized include the length oftime allotted for a joining reaction, the concentration of variousreagents, and the concentration of polynucleotides to be joined. In someembodiments, a circularization reaction preserves the distribution offragment lengths present in a sample prior to circularization. Forexample, one or more of the mean, median, mode, and standard deviationof fragment lengths in a sample before circularization and ofcircularized polynucleotides are within 75%, 80%, 85%, 90%, 95%, or moreof one another.

In some cases, rather than preferentially forming self-joiningcircularization products, one or more adapter oligonucleotides are used,such that the 5′ end and 3′ end of a polynucleotide in the sample arejoined by way of one or more intervening adapter oligonucleotides toform a circular polynucleotide. For example, the 5′ end of apolynucleotide can be joined to the 3′ end of an adapter, and the 5′ endof the same adapter can be joined to the 3′ end of the samepolynucleotide. An adapter oligonucleotide includes any oligonucleotidehaving a sequence, at least a portion of which is known, that can bejoined to a sample polynucleotide. Adapter oligonucleotides can compriseDNA, RNA, nucleotide analogues, non-canonical nucleotides, labelednucleotides, modified nucleotides, or combinations thereof. Adapteroligonucleotides can be single-stranded, double-stranded, or partialduplex. In general, a partial-duplex adapter comprises one or moresingle-stranded regions and one or more double-stranded regions.Double-stranded adapters can comprise two separate oligonucleotideshybridized to one another (also referred to as an “oligonucleotideduplex”), and hybridization may leave one or more blunt ends, one ormore 3′ overhangs, one or more 5′ overhangs, one or more bulgesresulting from mismatched and/or unpaired nucleotides, or anycombination of these. When two hybridized regions of an adapter areseparated from one another by a non-hybridized region, a “bubble”structure results. Adapters of different kinds can be used incombination, such as adapters of different sequences. Different adapterscan be joined to sample polynucleotides in sequential reactions orsimultaneously. In some embodiments, identical adapters are added toboth ends of a target polynucleotide. For example, first and secondadapters can be added to the same reaction. Adapters can be manipulatedprior to combining with sample polynucleotides. For example, terminalphosphates can be added or removed.

Where adapter oligonucleotides are used, the adapter oligonucleotidescan contain one or more of a variety of sequence elements, including butnot limited to, one or more amplification primer annealing sequences orcomplements thereof, one or more sequencing primer annealing sequencesor complements thereof, one or more barcode sequences, one or morecommon sequences shared among multiple different adapters or subsets ofdifferent adapters, one or more restriction enzyme recognition sites,one or more overhangs complementary to one or more target polynucleotideoverhangs, one or more probe binding sites (e.g. for attachment to asequencing platform, such as a flow cell for massive parallelsequencing, such as flow cells as developed by Illumina, Inc.), one ormore random or near-random sequences (e.g. one or more nucleotidesselected at random from a set of two or more different nucleotides atone or more positions, with each of the different nucleotides selectedat one or more positions represented in a pool of adapters comprisingthe random sequence), and combinations thereof. In some cases, theadapters may be used to purify those circles that contain the adapters,for example by using beads (particularly magnetic beads for ease ofhandling) that are coated with oligonucleotides comprising acomplementary sequence to the adapter, that can “capture” the closedcircles with the correct adapters by hybridization thereto, wash awaythose circles that do not contain the adapters and any unligatedcomponents, and then release the captured circles from the beads. Inaddition, in some cases, the complex of the hybridized capture probe andthe target circle can be directly used to generate concatamers, such asby direct rolling circle amplification (RCA). In some embodiments, theadapters in the circles can also be used as a sequencing primer. Two ormore sequence elements can be non-adjacent to one another (e.g.separated by one or more nucleotides), adjacent to one another,partially overlapping, or completely overlapping. For example, anamplification primer annealing sequence can also serve as a sequencingprimer annealing sequence. Sequence elements can be located at or nearthe 3′ end, at or near the 5′ end, or in the interior of the adapteroligonucleotide. A sequence element may be of any suitable length, suchas about or less than about 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35,40, 45, 50 or more nucleotides in length. Adapter oligonucleotides canhave any suitable length, at least sufficient to accommodate the one ormore sequence elements of which they are comprised. In some embodiments,adapters are about or less than about 10, 15, 20, 25, 30, 35, 40, 45,50, 55, 60, 65, 70, 75, 80, 90, 100, 200, or more nucleotides in length.In some embodiments, an adapter oligonucleotide is in the range of about12 to 40 nucleotides in length, such as about 15 to 35 nucleotides inlength.

In some embodiments, the adapter oligonucleotides joined to fragmentedpolynucleotides from one sample comprise one or more sequences common toall adapter oligonucleotides and a barcode that is unique to theadapters joined to polynucleotides of that particular sample, such thatthe barcode sequence can be used to distinguish polynucleotidesoriginating from one sample or adapter joining reaction frompolynucleotides originating from another sample or adapter joiningreaction. In some embodiments, an adapter oligonucleotide comprises a 5′overhang, a 3′ overhang, or both that is complementary to one or moretarget polynucleotide overhangs. Complementary overhangs can be one ormore nucleotides in length, including but not limited to 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more nucleotides in length.Complementary overhangs may comprise a fixed sequence. Complementaryoverhangs of an adapter oligonucleotide may comprise a random sequenceof one or more nucleotides, such that one or more nucleotides areselected at random from a set of two or more different nucleotides atone or more positions, with each of the different nucleotides selectedat one or more positions represented in a pool of adapters withcomplementary overhangs comprising the random sequence. In someembodiments, an adapter overhang is complementary to a targetpolynucleotide overhang produced by restriction endonuclease digestion.In some embodiments, an adapter overhang consists of an adenine or athymine.

A variety of methods for circularizing polynucleotides are available.FIGS. 28A-28E illustrate non-limiting examples of methods forcircularizing polynucleotides. In some embodiments, circularizationcomprises an enzymatic reaction, such as use of a ligase (e.g. an RNA orDNA ligase). A variety of ligases are available, including, but notlimited to, Circligase™ (Epicentre; Madison, WI), RNA ligase, T4 RNALigase 1 (ssRNA Ligase, which works on both DNA and RNA). In addition,T4 DNA ligase can also ligate ssDNA if no dsDNA templates are present,although this is generally a slow reaction. Other non-limiting examplesof ligases include NAD-dependent ligases including Taq DNA ligase,Thermus filiformis DNA ligase, Escherichia coli DNA ligase, Tth DNAligase, Thermus scotoductus DNA ligase (I and II), thermostable ligase,Ampligase thermostable DNA ligase, VanC-type ligase, 9° N DNA Ligase,Tsp DNA ligase, and novel ligases discovered by bioprospecting;ATP-dependent ligases including T4 RNA ligase, T4 DNA ligase, T3 DNAligase, T7 DNA ligase, Pfu DNA ligase, DNA ligase 1, DNA ligase III, DNAligase IV, and novel ligases discovered by bioprospecting; andwild-type, mutant isoforms, and genetically engineered variants thereof.Where self-joining is desired, the concentration of polynucleotides andenzyme can be adjusted to facilitate the formation of intramolecularcircles rather than intermolecular structures. Reaction temperatures andtimes can be adjusted as well. In some embodiments, 60° C. is used tofacilitate intramolecular circles. In some embodiments, reaction timesare between 12-16 hours. Reaction conditions may be those specified bythe manufacturer of the selected enzyme. In some embodiments, anexonuclease step can be included to digest any unligated nucleic acidsafter the circularization reaction. That is, closed circles do notcontain a free 5′ or 3′ end, and thus the introduction of a 5′ or 3′exonuclease will not digest the closed circles but will digest theunligated components. This may find particular use in multiplex systems.

In general, joining ends of a polynucleotide to one-another to form acircular polynucleotide (either directly, or with one or moreintermediate adapter oligonucleotides) produces a junction having ajunction sequence. Where the 5′ end and 3′ end of a polynucleotide arejoined via an adapter polynucleotide, the term “junction” can refer to ajunction between the polynucleotide and the adapter (e.g. one of the 5′end junction or the 3′ end junction), or to the junction between the 5′end and the 3′ end of the polynucleotide as formed by and including theadapter polynucleotide. Where the 5′ end and the 3′ end of apolynucleotide are joined without an intervening adapter (e.g. the 5′end and 3′ end of a single-stranded DNA), the term “junction” refers tothe point at which these two ends are joined. A junction may beidentified by the sequence of nucleotides comprising the junction (alsoreferred to as the “junction sequence”). In some embodiments, samplescomprise polynucleotides having a mixture of ends formed by naturaldegradation processes (such as cell lysis, cell death, and otherprocesses by which DNA is released from a cell to its surroundingenvironment in which it may be further degraded, such as in cell-freepolynucleotides, such as cell-free DNA and cell-free RNA), fragmentationthat is a byproduct of sample processing (such as fixing, staining,and/or storage procedures), and fragmentation by methods that cleave DNAwithout restriction to specific target sequences (e.g. mechanicalfragmentation, such as by sonication; non-sequence specific nucleasetreatment, such as DNase I, fragmentase). Where samples comprisepolynucleotides having a mixture of ends, the likelihood that twopolynucleotides will have the same 5′ end or 3′ end is low, and thelikelihood that two polynucleotides will independently have both thesame 5′ end and 3′ end is extremely low. Accordingly, in someembodiments, junctions may be used to distinguish differentpolynucleotides, even where the two polynucleotides comprise a portionhaving the same target sequence. Where polynucleotide ends are joinedwithout an intervening adapter, a junction sequence may be identified byalignment to a reference sequence. For example, where the order of twocomponent sequences appears to be reversed with respect to the referencesequence, the point at which the reversal appears to occur may be anindication of a junction at that point. Where polynucleotide ends arejoined via one or more adapter sequences, a junction may be identifiedby proximity to the known adapter sequence, or by alignment as above ifa sequencing read is of sufficient length to obtain sequence from boththe 5′ and 3′ ends of the circularized polynucleotide. In someembodiments, the formation of a particular junction is a sufficientlyrare event such that it is unique among the circularized polynucleotidesof a sample.

FIGS. 4A-4C illustrate three non-limiting examples of methods ofcircularizing polynucleotides. At the top (FIG. 4A), the polynucleotidesare circularized in the absence of adapters, while the middle scheme(FIG. 4B) depicts the use of adapters, and the bottom scheme (FIG. 4C)utilizes two adapters. Where two adapters are used, one can be joined tothe 5′ end of the polynucleotide while the second adapter can be joinedto the 3′ end of the same polynucleotide. In some embodiments, adapterligation may comprise use of two different adapters along with a“splint” nucleic acid that is complementary to the two adapters tofacilitate ligation. Forked or “Y” adapters may also be used. Where twoadapters are used, polynucleotides having the same adapter at both endsmay be removed in subsequent steps due to self-annealing. FIGS. 1-3depict embodiments of methods according to the present disclosurewherein polynucleotides are circularized in the absence of adaptors(FIG. 1 ) and in the presence of adaptors (FIGS. 2 and 3 ). Circularizedpolynucleotides with adaptors (FIGS. 2 and 3 ) can be amplified byrolling circle amplification (RCA) using target specific primers (FIG. 2) or primers which hybridize to the adaptor sequences (FIG. 3 ).

FIGS. 6A-6B illustrate further non-limiting example methods ofcircularizing polynucleotides, such as single-stranded DNA. The adaptercan be asymmetrically added to either the 5′ or 3′ end of apolynucleotide. As shown in FIG. 6A, the single-stranded DNA (ssDNA) canhave a free hydroxyl group at the 3′ end, and the adapter can have ablocked 3′ end such that in the presence of a ligase, a preferredreaction joins the 3′ end of the ssDNA to the 5′ end of the adapter. Inthis embodiment, it can be useful to use agents such as polyethyleneglycols (PEGs) to drive the intermolecular ligation of a single ssDNAfragment and a single adapter, prior to an intramolecular ligation toform a circle. The reverse order of ends can also be done (blocked 3′,free 5′, etc.). Once the linear ligation is accomplished, the ligatedpieces can be treated with an enzyme to remove the blocking moiety, suchas through the use of a kinase or other suitable enzymes or chemistries.Once the blocking moiety is removed, the addition of a circularizationenzyme, such as CircLigase, allows an intramolecular reaction to formthe circularized polynucleotide. As shown in FIG. 6B, by using adouble-stranded adapter with one strand having a 5′ or 3′ end blocked, adouble stranded structure can be formed, which upon ligation produces adouble-stranded fragment with nicks. The two strands can then beseparated, the blocking moiety removed, and the single-stranded fragmentcircularized to form a circularized polynucleotide. In some cases, asillustrated in FIG. 8 , double-stranded DNA (dsDNA) is circularized toyield a circularized, double-stranded circle. The double-stranded circlecan be denatured to allow for primer binding and amplification of bothstrands.

In some embodiments, molecular clamps are used to bring two ends of apolynucleotide (e.g. a single-stranded DNA) together in order to enhancethe rate of intramolecular circularization. An example illustration ofone such process is provided in FIG. 5 . This can be done with orwithout adapters. The use of molecular clamps may be particularly usefulin cases where the average polynucleotide fragment is greater than about100 nucleotides in length. In some embodiments, the molecular clampprobe comprises three domains: a first domain, an intervening domain,and a second domain. The first and second domains will hybridize tocorresponding sequences in a target polynucleotide via sequencecomplementarity. The intervening domain of the molecular clamp probe maynot significantly hybridize with the target sequence. The hybridizationof the clamp with the target polynucleotide thus can bring the two endsof the target sequence into closer proximity, which facilitates theintramolecular circularization of the target sequence in the presence ofa circularization enzyme. In some embodiments, this is additionallyuseful as the molecular clamp can serve as an amplification primer aswell.

After circularization, ligation enzymes are removed from reactionproducts using a protein degradation step. In some embodiments, proteindegradation comprises treatment to remove or degrade ligase used in thecircularization reaction. In some embodiments, treatment to degradeligase comprises treatment with a protease, such as proteinase K.Proteinase K treatment may follow manufacturer protocols or standardprotocols (e.g. as provided in Sambrook and Green, Molecular Cloning: ALaboratory Manual, 4th Edition (2012)). In some embodiments, proteindegradation comprises treatment with a low pH or acidic solution orbuffer. In some embodiments, protein degradation comprises heating thereaction, for example heating the reaction above 55° C., above 60° C.,above 65° C., above 70° C., or greater. In some embodiments, linearpolynucleotides are degraded, after circularization. In someembodiments, linear polynucleotides are degraded using an exonuclease.In some embodiments, the exonuclease comprises a lambda exonuclease. Insome embodiments, the exonuclease comprises a RecJf nuclease. In someembodiments, an exonuclease is selected from at least one of ExoI,ExoIII, ExoV, ExoVII, and ExoT.

Circularization may be followed directly by sequencing the circularizedpolynucleotides. Alternatively, sequencing may be preceded by one ormore amplification reactions. In general, “amplification” refers to aprocess by which one or more copies are made of a target polynucleotideor a portion thereof. A variety of methods of amplifying polynucleotides(e.g. DNA and/or RNA) are available. Amplification may be linear,exponential, or involve both linear and exponential phases in amulti-phase amplification process. Amplification methods may involvechanges in temperature, such as a heat denaturation step, or may beisothermal processes that do not require heat denaturation. Thepolymerase chain reaction (PCR) uses multiple cycles of denaturation,annealing of primer pairs to opposite strands, and primer extension toexponentially increase copy numbers of the target sequence. Denaturationof annealed nucleic acid strands may be achieved by the application ofheat, increasing local metal ion concentrations (e.g. U.S. Pat. No.6,277,605), ultrasound radiation (e.g. WO/2000/049176), application ofvoltage (e.g. U.S. Pat. Nos. 5,527,670, 6,033,850, 5,939,291, and6,333,157), and application of an electromagnetic field in combinationwith primers bound to a magnetically-responsive material (e.g. U.S. Pat.No. 5,545,540). In a variation called RT-PCR, reverse transcriptase (RT)is used to make a complementary DNA (cDNA) from RNA, and the cDNA isthen amplified by PCR to produce multiple copies of DNA (e.g. U.S. Pat.Nos. 5,322,770 and 5,310,652). One example of an isothermalamplification method is strand displacement amplification, commonlyreferred to as SDA, which uses cycles of annealing pairs of primersequences to opposite strands of a target sequence, primer extension inthe presence of a dNTP to produce a duplex hemiphosphorothioated primerextension product, endonuclease-mediated nicking of a hemimodifiedrestriction endonuclease recognition site, and polymerase-mediatedprimer extension from the 3′ end of the nick to displace an existingstrand and produce a strand for the next round of primer annealing,nicking and strand displacement, resulting in geometric amplification ofproduct (e.g. U.S. Pat. Nos. 5,270,184 and 5,455,166). Thermophilic SDA(tSDA) uses thermophilic endonucleases and polymerases at highertemperatures in essentially the same method (European Pat. No. 0 684315). Other amplification methods include rolling circle amplification(RCA) (e.g., Lizardi, “Rolling Circle Replication Reporter Systems,”U.S. Pat. No. 5,854,033); helicase dependent amplification (HDA) (e.g.,Kong et al., “Helicase Dependent Amplification Nucleic Acids,” U.S. Pat.Appln. Pub. No. US 2004-0058378 A1); and loop-mediated isothermalamplification (LAMP) (e.g., Notomi et al., “Process for SynthesizingNucleic Acid,” U.S. Pat. No. 6,410,278). In some cases, isothermalamplification utilizes transcription by an RNA polymerase from apromoter sequence, such as may be incorporated into an oligonucleotideprimer. Transcription-based amplification methods include nucleic acidsequence based amplification, also referred to as NASBA (e.g. U.S. Pat.No. 5,130,238); methods which rely on the use of an RNA replicase toamplify the probe molecule itself, commonly referred to as Qβ replicase(e.g., Lizardi, P. et al. (1988) BioTechnol. 6, 1197-1202);self-sustained sequence replication (e.g., Guatelli, J. et al. (1990)Proc. Natl. Acad. Sci. USA 87, 1874-1878; Landgren (1993) Trends inGenetics 9, 199-202; and HELEN H. LEE et al., NUCLEIC ACID AMPLIFICATIONTECHNOLOGIES (1997)); and methods for generating additionaltranscription templates (e.g. U.S. Pat. Nos. 5,480,784 and 5,399,491).Further methods of isothermal nucleic acid amplification include the useof primers containing non-canonical nucleotides (e.g. uracil or RNAnucleotides) in combination with an enzyme that cleaves nucleic acids atthe non-canonical nucleotides (e.g. DNA glycosylase or RNaseH) to exposebinding sites for additional primers (e.g. U.S. Pat. Nos. 6,251,639,6,946,251, and 7,824,890). Isothermal amplification processes can belinear or exponential.

In some embodiments, amplification comprises rolling circleamplification (RCA). A typical RCA reaction mixture comprises one ormore primers, a polymerase, and dNTPs, and produces concatemers.Typically, the polymerase in an RCA reaction is a polymerase havingstrand-displacement activity. A variety of such polymerases areavailable, non-limiting examples of which include exonuclease minus DNAPolymerase I large (Klenow) Fragment, Phi29 DNA polymerase, Taq DNAPolymerase and the like. In general, a concatemer is a polynucleotideamplification product comprising two or more copies of a target sequencefrom a template polynucleotide (e.g. about or more than about 2, 3, 4,5, 6, 7, 8, 9, 10, or more copies of the target sequence; in someembodiments, about or more than about 2 copies). Amplification primersmay be of any suitable length, such as about or at least about 5, 10,15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, or morenucleotides, any portion or all of which may be complementary to thecorresponding target sequence to which the primer hybridizes (e.g.about, or at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or morenucleotides). FIGS. 7A-7C depicts three non-limiting examples ofsuitable primers. FIG. 7A shows the use of no adapters and a targetspecific primer, which can be used for the detection of the presence orabsence of a sequence variant within specific target sequences. In someembodiments, multiple target-specific primers for a plurality of targetsare used in the same reaction. For example, target-specific primers forabout or at least about 10, 50, 100, 150, 200, 250, 300, 400, 500, 1000,2500, 5000, 10000, 15000, or more different target sequences may be usedin a single amplification reaction in order to amplify a correspondingnumber of target sequences (if present) in parallel. Multiple targetsequences may correspond to different portions of the same gene,different genes, or non-gene sequences. Where multiple primers targetmultiple target sequences in a single gene, primers may be spaced alongthe gene sequence (e.g. spaced apart by about or at least about 50nucleotides, every 50-150 nucleotides, or every 50-100 nucleotides) inorder to cover all or a specified portion of a target gene. FIG. 7Cillustrates use of a primer that hybridizes to an adapter sequence(which in some cases may be an adapter oligonucleotide itself).

FIG. 7B illustrates an example of amplification by random primers. Ingeneral, a random primer comprises one or more random or near-randomsequences (e.g. one or more nucleotides selected at random from a set oftwo or more different nucleotides at one or more positions, with each ofthe different nucleotides selected at one or more positions representedin a pool of adapters comprising the random sequence). In this way,polynucleotides (e.g. all or substantially all circularizedpolynucleotides) can be amplified in a sequence non-specific fashion.Such procedures may be referred to as “whole genome amplification”(WGA); however, typical WGA protocols (which do not involve acircularization step) do not efficiently amplify short polynucleotides,such as polynucleotide fragments contemplated by the present disclosure.For further illustrative discussion of WGA procedures, see for exampleLi et al (2006) J Mol. Diagn. 8(1):22-30.

Where circularized polynucleotides are amplified prior to sequencing,amplified products may be subjected to sequencing directly withoutenrichment, or subsequent to one or more enrichment steps. Enrichmentmay comprise purifying one or more reaction components, such as byretention of amplification products or removal of one or more reagents.For example, amplification products may be purified by hybridization toa plurality of probes attached to a substrate, followed by release ofcaptured polynucleotides, such as by a washing step. Alternatively,amplification products can be labeled with a member of a binding pairfollowed by binding to the other member of the binding pair attached toa substrate, and washing to release the amplification product. Possiblesubstrates include, but are not limited to, glass and modified orfunctionalized glass, plastics (including acrylics, polystyrene andcopolymers of styrene and other materials, polypropylene, polyethylene,polybutylene, polyurethanes, Teflon™, etc.), polysaccharides, nylon ornitrocellulose, ceramics, resins, silica or silica-based materialsincluding silicon and modified silicon, carbon, metals, inorganicglasses, plastics, optical fiber bundles, and a variety of otherpolymers. In some embodiments, the substrate is in the form of a bead orother small, discrete particle, which may be a magnetic or paramagneticbead to facilitate isolation through application of a magnetic field. Ingeneral, “binding pair” refers to one of a first and a second moiety,wherein the first and the second moiety have a specific binding affinityfor each other. Suitable binding pairs include, but are not limited to,antigens/antibodies (for example, digoxigenin/anti-digoxigenin,dinitrophenyl (DNP)/anti-DNP, dansyl-X-anti-dansyl,Fluorescein/anti-fluorescein, lucifer yellow/anti-lucifer yellow, andrhodamine anti-rhodamine); biotin/avidin (or biotin/streptavidin);calmodulin binding protein (CBP)/calmodulin; hormone/hormone receptor;lectin/carbohydrate; peptide/cell membrane receptor; protein A/antibody;hapten/antihapten; enzyme/cofactor; and enzyme/substrate.

In some embodiments, enrichment following amplification of circularizedpolynucleotides comprises one or more additional amplificationreactions. In some embodiments, enrichment comprises amplifying a targetsequence comprising sequence A and sequence B (oriented in a 5′ to 3′direction) in an amplification reaction mixture comprising (a) theamplified polynucleotide; (b) a first primer comprising sequence A′,wherein the first primer specifically hybridizes to sequence A of thetarget sequence via sequence complementarity between sequence A andsequence A′; (c) a second primer comprising sequence B, wherein thesecond primer specifically hybridizes to sequence B′ present in acomplementary polynucleotide comprising a complement of the targetsequence via sequence complementarity between B and B′; and (d) apolymerase that extends the first primer and the second primer toproduce amplified polynucleotides; wherein the distance between the 5′end of sequence A and the 3′ end of sequence B of the target sequence is75 nt or less. FIG. 10 illustrates an example arrangement of the firstand second primer with respect to a target sequence in the context of asingle repeat (which will typically not be amplified unless circular)and concatemers comprising multiple copies of the target sequence. Giventhe orientation of the primers with respect to a monomer of the targetsequence, this arrangement may be referred to as “back to back” (B2B) or“inverted” primers. Amplification with B2B primers facilitatesenrichment of circular and/or concatemeric amplification products.Moreover, this orientation combined with a relatively smaller footprint(total distance spanned by a pair of primers) permits amplification of awider variety of fragmentation events around a target sequence, as ajunction is less likely to occur between primers than in the arrangementof primers found in a typical amplification reaction (facing oneanother, spanning a target sequence). Additional embodiments andadvantages of back to back primers are illustrated in FIGS. 13A-13C.

In some embodiments, the distance between the 5′ end of sequence A andthe 3′ end of sequence B is about or less than about 200, 150, 100, 75,50, 40, 30, 25, 20, 15, or fewer nucleotides. In some embodiments,sequence A is the complement of sequence B. In some embodiments,multiple pairs of B2B primers directed to a plurality of differenttarget sequences are used in the same reaction to amplify a plurality ofdifferent target sequences in parallel (e.g. about or at least about 10,50, 100, 150, 200, 250, 300, 400, 500, 1000, 2500, 5000, 10000, 15000,or more different target sequences). Primers can be of any suitablelength, such as described elsewhere herein. Amplification may compriseany suitable amplification reaction under appropriate conditions, suchas an amplification reaction described herein. In some embodiments,amplification is a polymerase chain reaction.

In some embodiments, B2B primers comprise at least two sequenceelements, a first element that hybridizes to a target sequence viasequence complementarity, and a 5′ “tail” that does not hybridize to thetarget sequence during a first amplification phase at a firsthybridization temperature during which the first element hybridizes(e.g. due to lack of sequence complementarity between the tail and theportion of the target sequence immediately 3′ with respect to where thefirst element binds). For example, the first primer comprises sequence C5′ with respect to sequence A′, the second primer comprises sequence D5′ with respect to sequence B, and neither sequence C nor sequence Dhybridize to the plurality of concatemers during a first amplificationphase at a first hybridization temperature. In some embodiments in whichsuch tailed primers are used, amplification can comprise a first phaseand a second phase; the first phase comprises a hybridization step at afirst temperature, during which the first and second primers hybridizeto the concatemers (or circularized polynucleotides) and primerextension; and the second phase comprises a hybridization step at asecond temperature that is higher than the first temperature, duringwhich the first and second primers hybridize to amplification productscomprising extended first or second primers, or complements thereof, andprimer extension. The higher temperature favors hybridization betweenthe first element and tail element of the primer in primer extensionproducts over shorter fragments formed by hybridization between only thefirst element in a primer and an internal target sequence within aconcatemer. Accordingly, the two-phase amplification may be used toreduce the extent to which short amplification products might otherwisebe favored, thereby maintaining a relatively higher proportion ofamplification products having two or more copies of a target sequence.For example, after 5 cycles (e.g. at least 5, 6, 7, 8, 9, 10, 15, 20, ormore cycles) of hybridization at the second temperature and primerextension, at least 5% (e.g. at least 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%,25%, 30%, or more) of amplified polynucleotides in the reaction mixturecomprise two or more copies of the target sequence. An illustration ofan embodiment in accordance with this two-phase, tailed B2B primeramplification process is illustrated in FIGS. 11A-11D.

In some embodiments, enrichment comprise amplification under conditionsthat are skewed to increase the length of amplicons from concatemers.For example, the primer concentration can be lowered, such that notevery priming site will hybridize a primer, thus making the PCR productslonger. Similarly, decreasing the primer hybridization time during thecycles will similarly allow fewer primers to hybridize, thus also makingthe average PCR amplicon size increase. Furthermore, increasing thetemperature and/or extension time of the cycles may similarly increasethe average length of the PCR amplicons. Any combination of thesetechniques can be used.

In some embodiments, particularly where an amplification with B2Bprimers has been performed, amplification products are treated to filterthe resulting amplicons on the basis of size to reduce and/or eliminatethe number of monomers a mixture comprising concatemers. This can bedone using a variety of available techniques, including, but not limitedto, fragment excision from gels and gel filtration (e.g. to enrich forfragments larger than about 300, 400, 500, or more nucleotides inlength); as well as SPRI beads (Agencourt AMPure XP) for size selectionby fine-tuning the binding buffer concentration. For example, the use of0.6× binding buffer during mixing with DNA fragments may be used topreferentially bind DNA fragments larger than about 500 base pairs (bp).

In some embodiments, where amplification result in single-strandedconcatamers, the single strands are converted to double-strandedconstructs either prior to or as part of the formation of sequencinglibraries that are generated for sequencing reactions. A variety ofsuitable methods to generate a double-stranded construct from asingle-stranded nucleic acid are available. A number of possible methodsare depicted in FIGS. 9A-9D, although a number of other methods can beused as well. As shown in FIG. 9A, for example, the use of randomprimers, polymerase, dNTPs and a ligase will result in double strands.FIG. 9B depicts the second strand synthesis when the concatemer containsadapter sequences, which can be used as the primers in the reaction.FIG. 9C depicts the use of a “loop,” where one terminus of the loopadapter is added to the terminus of the concatamers, wherein the loopadapter has a small section of self-hybridizing nucleic acids. In thiscase, the ligation of the loop adapter results in the loop that is selfhybridized and serves as the polymerase primer template. FIG. 9D showsthe use of hyper-branching primers, generally of the most use in caseswhere the target sequence is known, where multiple strands are formed,particularly when a polymerase with a strong strand displacementfunction is used.

According to some embodiments, circularized polynucleotides (oramplification products thereof, which may have optionally been enriched)are subjected to a sequencing reaction to generate sequencing reads.Sequencing reads produced by such methods may be used in accordance withother methods disclosed herein. A variety of sequencing methodologiesare available, particularly high-throughput sequencing methodologies.Examples include, without limitation, sequencing systems manufactured byIllumina (sequencing systems such as HiSeq® and MiSeq®), LifeTechnologies (Ion Torrent®, SOLiD®, etc.), Roche's 454 Life Sciencessystems, Pacific Biosciences systems, etc. In some embodiments,sequencing comprises use of HiSeq® and MiSeq® systems to produce readsof about or more than about 50, 75, 100, 125, 150, 175, 200, 250, 300,or more nucleotides in length. In some embodiments, sequencing comprisesa sequencing by synthesis process, where individual nucleotides areidentified iteratively, as they are added to the growing primerextension product. Pyrosequencing is an example of a sequence bysynthesis process that identifies the incorporation of a nucleotide byassaying the resulting synthesis mixture for the presence of by-productsof the sequencing reaction, namely pyrophosphate. In particular, aprimer/template/polymerase complex is contacted with a single type ofnucleotide. If that nucleotide is incorporated, the polymerizationreaction cleaves the nucleoside triphosphate between the α and βphosphates of the triphosphate chain, releasing pyrophosphate. Thepresence of released pyrophosphate is then identified using achemiluminescent enzyme reporter system that converts the pyrophosphate,with AMP, into ATP, then measures ATP using a luciferase enzyme toproduce measurable light signals. Where light is detected, the base isincorporated, where no light is detected, the base is not incorporated.Following appropriate washing steps, the various bases are cyclicallycontacted with the complex to sequentially identify subsequent bases inthe template sequence. See, e.g., U.S. Pat. No. 6,210,891.

In related sequencing processes, the primer/template/polymerase complexis immobilized upon a substrate and the complex is contacted withlabeled nucleotides. The immobilization of the complex may be throughthe primer sequence, the template sequence and/or the polymerase enzyme,and may be covalent or noncovalent. For example, immobilization of thecomplex can be via a linkage between the polymerase or the primer andthe substrate surface. In alternate configurations, the nucleotides areprovided with and without removable terminator groups. Uponincorporation, the label is coupled with the complex and is thusdetectable. In the case of terminator bearing nucleotides, all fourdifferent nucleotides, bearing individually identifiable labels, arecontacted with the complex. Incorporation of the labeled nucleotidearrests extension, by virtue of the presence of the terminator, and addsthe label to the complex, allowing identification of the incorporatednucleotide. The label and terminator are then removed from theincorporated nucleotide, and following appropriate washing steps, theprocess is repeated. In the case of non-terminated nucleotides, a singletype of labeled nucleotide is added to the complex to determine whetherit will be incorporated, as with pyrosequencing. Following removal ofthe label group on the nucleotide and appropriate washing steps, thevarious different nucleotides are cycled through the reaction mixture inthe same process. See, e.g., U.S. Pat. No. 6,833,246, incorporatedherein by reference in its entirety for all purposes. For example, theIllumina Genome Analyzer System is based on technology described in WO98/44151, wherein DNA molecules are bound to a sequencing platform (flowcell) via an anchor probe binding site (otherwise referred to as a flowcell binding site) and amplified in situ on a glass slide. A solidsurface on which DNA molecules are amplified typically comprise aplurality of first and second bound oligonucleotides, the firstcomplementary to a sequence near or at one end of a targetpolynucleotide and the second complementary to a sequence near or at theother end of a target polynucleotide. This arrangement permits bridgeamplification, such as described in US20140121116. The DNA molecules arethen annealed to a sequencing primer and sequenced in parallelbase-by-base using a reversible terminator approach. Hybridization of asequencing primer may be preceded by cleavage of one strand of adouble-stranded bridge polynucleotide at a cleavage site in one of thebound oligonucleotides anchoring the bridge, thus leaving one singlestrand not bound to the solid substrate that may be removed bydenaturing, and the other strand bound and available for hybridizationto a sequencing primer. Typically, the Illumina Genome Analyzer Systemutilizes flow-cells with 8 channels, generating sequencing reads of 18to 36 bases in length, generating >1.3 Gbp of high quality data per run(see www.illumina.com).

In yet a further sequence by synthesis process, the incorporation ofdifferently labeled nucleotides is observed in real time as templatedependent synthesis is carried out. In particular, an individualimmobilized primer/template/polymerase complex is observed asfluorescently labeled nucleotides are incorporated, permitting real timeidentification of each added base as it is added. In this process, labelgroups are attached to a portion of the nucleotide that is cleavedduring incorporation. For example, by attaching the label group to aportion of the phosphate chain removed during incorporation, i.e., β, γ,or other terminal phosphate group on a nucleoside polyphosphate, thelabel is not incorporated into the nascent strand, and instead, naturalDNA is produced. Observation of individual molecules typically involvesthe optical confinement of the complex within a very small illuminationvolume. By optically confining the complex, one creates a monitoredregion in which randomly diffusing nucleotides are present for a veryshort period of time, while incorporated nucleotides are retained withinthe observation volume for longer as they are being incorporated. Thisresults in a characteristic signal associated with the incorporationevent, which is also characterized by a signal profile that ischaracteristic of the base being added. In related aspects, interactinglabel components, such as fluorescent resonant energy transfer (FRET)dye pairs, are provided upon the polymerase or other portion of thecomplex and the incorporating nucleotide, such that the incorporationevent puts the labeling components in interactive proximity, and acharacteristic signal results, that is again, also characteristic of thebase being incorporated (See, e.g., U.S. Pat. Nos. 6,917,726, 7,033,764,7,052,847, 7,056,676, 7,170,050, 7,361,466, and 7,416,844; and US20070134128).

In some embodiments, the nucleic acids in the sample can be sequenced byligation. This method typically uses a DNA ligase enzyme to identify thetarget sequence, for example, as used in the polony method and in theSOLiD technology (Applied Biosystems, now Invitrogen). In general, apool of all possible oligonucleotides of a fixed length is provided,labeled according to the sequenced position. Oligonucleotides areannealed and ligated; the preferential ligation by DNA ligase formatching sequences results in a signal corresponding to thecomplementary sequence at that position.

In some embodiments, sequencing libraries are constructed from theamplified DNA concatemers prior to sequencing analysis. The amplifiedDNA concatemers can be simultaneously fragmented and tagged withsequencing adapters as illustrated in FIG. 12A. In some cases, theamplified DNA concatemers are fragmented, for example by sonication, andadaptors are added to both ends of the fragments as illustrated in FIG.12B.

According to some embodiments, a sequence difference between sequencingreads and a reference sequence are called as a genuine sequence variant(e.g. existing in the sample prior to amplification or sequencing, andnot a result of either of these processes) if it occurs in at least twodifferent polynucleotides (e.g. two different circular polynucleotides,which can be distinguished as a result of having different junctions).Because sequence variants that are the result of amplification orsequencing errors are unlikely to be duplicated exactly (e.g. positionand type) on two different polynucleotides comprising the same targetsequence, adding this validation parameter greatly reduces thebackground of erroneous sequence variants, with a concurrent increase inthe sensitivity and accuracy of detecting actual sequence variation in asample. In some embodiments, a sequence variant having a frequency ofabout or less than about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%,0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%, 0.001%, orlower is sufficiently above background to permit an accurate call. Insome embodiments, the sequence variant occurs with a frequency of aboutor less than about 0.1%. In some embodiments, the frequency of asequence variant is sufficiently above background when such frequency isstatistically significantly above the background error rate (e.g. with ap-value of about or less than about 0.05, 0.01, 0.001, 0.0001, orlower). In some embodiments, the frequency of a sequence variant issufficiently above background when such frequency is about or at leastabout 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold,10-fold, 25-fold, 50-fold, 100-fold, or more above the background errorrate (e.g. at least 5-fold higher). In some embodiments, the backgrounderror rate in accurately determining the sequence at a given position isabout or less than about 1%, 0.5%, 0.1%, 0.05%, 0.01%, 0.005%, 0.001%,0.0005%, or lower. In some embodiments, the error rate is lower than0.001%.

In some embodiments, identifying a genuine sequence variant (alsoreferred to as “calling” or “making a call”) comprises optimallyaligning one or more sequencing reads with a reference sequence toidentify differences between the two, as well as to identify junctions.In general, alignment involves placing one sequence along anothersequence, iteratively introducing gaps along each sequence, scoring howwell the two sequences match, and preferably repeating for variouspositions along the reference. The best-scoring match is deemed to bethe alignment and represents an inference about the degree ofrelationship between the sequences. In some embodiments, a referencesequence to which sequencing reads are compared is a reference genome,such as the genome of a member of the same species as the subject. Areference genome may be complete or incomplete. In some embodiments, areference genome consists only of regions containing targetpolynucleotides, such as from a reference genome or from a consensusgenerated from sequencing reads under analysis. In some embodiments, areference sequence comprises or consists of sequences of polynucleotidesof one or more organisms, such as sequences from one or more bacteria,archaea, viruses, protists, fungi, or other organism. In someembodiments, the reference sequence consists of only a portion of areference genome, such as regions corresponding to one or more targetsequences under analysis (e.g. one or more genes, or portions thereof).For example, for detection of a pathogen (such as in the case ofcontamination detection), the reference genome is the entire genome ofthe pathogen (e.g. HIV, HPV, or a harmful bacterial strain, e.g. E.coli), or a portion thereof useful in identification, such as of aparticular strain or serotype. In some embodiments, sequencing reads arealigned to multiple different reference sequences, such as to screen formultiple different organisms or strains.

In a typical alignment, a base in a sequencing read alongside anon-matching base in the reference indicates that a substitutionmutation has occurred at that point. Similarly, where one sequenceincludes a gap alongside a base in the other sequence, an insertion ordeletion mutation (an “indel”) is inferred to have occurred. When it isdesired to specify that one sequence is being aligned to one other, thealignment is sometimes called a pairwise alignment. Multiple sequencealignment generally refers to the alignment of two or more sequences,including, for example, by a series of pairwise alignments. In someembodiments, scoring an alignment involves setting values for theprobabilities of substitutions and indels. When individual bases arealigned, a match or mismatch contributes to the alignment score by asubstitution probability, which could be, for example, 1 for a match and0.33 for a mismatch. An indel deducts from an alignment score by a gappenalty, which could be, for example, −1. Gap penalties and substitutionprobabilities can be based on empirical knowledge or a prioriassumptions about how sequences mutate. Their values affect theresulting alignment. Examples of algorithms for performing alignmentsinclude, without limitation, the Smith-Waterman (SW) algorithm, theNeedleman-Wunsch (NW) algorithm, algorithms based on the Burrows-WheelerTransform (BWT), and hash function aligners such as Novoalign (NovocraftTechnologies; available at www.novocraft.com), ELAND (Illumina, SanDiego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq(available at maq.sourceforge.net). One exemplary alignment program,which implements a BWT approach, is Burrows-Wheeler Aligner (BWA)available from the SourceForge web site maintained by Geeknet (Fairfax,Va.). BWT typically occupies 2 bits of memory per nucleotide, making itpossible to index nucleotide sequences as long as 4G base pairs with atypical desktop or laptop computer. The pre-processing includes theconstruction of BWT (i.e., indexing the reference) and the supportingauxiliary data structures. BWA includes two different algorithms, bothbased on BWT. Alignment by BWA can proceed using the algorithmbwa-short, designed for short queries up to about 200 by with low errorrate (<3%) (Li H. and Durbin R. Bioinformatics, 25:1754-60 (2009)). Thesecond algorithm, BWA-SW, is designed for long reads with more errors(Li H. and Durbin R. (2010). Fast and accurate long-read alignment withBurrows-Wheeler Transform. Bioinformatics, Epub.). The bwa-sw aligner issometimes referred to as “bwa-long”, “bwa long algorithm”, or similar.An alignment program that implements a version of the Smith-Watermanalgorithm is MUMmer, available from the SourceForge web site maintainedby Geeknet (Fairfax, Va.). MUMmer is a system for rapidly aligningentire genomes, whether in complete or draft form (Kurtz, S., et al.,Genome Biology, 5:R12 (2004); Delcher, A. L., et al., Nucl. Acids Res.,27:11 (1999)). For example, MUMmer 3.0 can find all 20-basepair orlonger exact matches between a pair of 5-megabase genomes in 13.7seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer.MUMmer can also align incomplete genomes; it can easily handle the 100sor 1000s of contigs from a shotgun sequencing project, and will alignthem to another set of contigs or a genome using the NUCmer programincluded with the system. Other non-limiting examples of alignmentprograms include: BLAT from Kent Informatics (Santa Cruz, Calif.) (Kent,W. J., Genome Research 4: 656-664 (2002)); SOAP2, from Beijing GenomicsInstitute (Beijing, Conn.) or BGI Americas Corporation (Cambridge,Mass.); Bowtie (Langmead, et al., Genome Biology, 10:R25 (2009));Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) or theELANDv2 component of the Consensus Assessment of Sequence and Variation(CASAVA) software (Illumina, San Diego, Calif.); RTG Investigator fromReal Time Genomics, Inc. (San Francisco, Calif.); Novoalign fromNovocraft (Selangor, Malaysia); Exonerate, European BioinformaticsInstitute (Hinxton, UK) (Slater, G., and Birney, E., BMC Bioinformatics6:31(2005)), Clustal Omega, from University College Dublin (Dublin,Ireland) (Sievers F., et al., Mol Syst Biol 7, article 539 (2011));ClustalW or ClustalX from University College Dublin (Dublin, Ireland)(Larkin M. A., et al., Bioinformatics, 23, 2947-2948 (2007)); and FASTA,European Bioinformatics Institute (Hinxton, UK) (Pearson W. R., et al.,PNAS 85(8):2444-8 (1988); Lipman, D. J., Science 227(4693):1435-41(1985)).

Illustrations of processes in accordance with some embodiments areprovided in FIGS. 36A-36H, and in particular for embodiments employing3′ tailing reactions. FIG. 36A shows cell-free double strandedpolynucleotides 1, 2, 3 . . . K (101), of a sample, which each contain agenetic locus (100) consisting of a single nucleotide, which may beoccupied by a “G” or a rare variant “A”. A sample containing suchpolynucleotides may be a patient tissue sample, such as a blood orplasma sample, or the like. Typically, reference sequences (e.g. inhuman genome databases) are available to compare the polynucleotidesequences to. Each polynucleotide has four sequence regionscorresponding to the sequences of the two complementary strands at eachend. Thus, for example, target polynucleotide 1 of FIG. 36A has sequenceregions n1 (110) and n2 (112) at each end of strand and hascomplementary sequence regions n1′ (116) and n2′ (108) at the ends ofcomplementary strand (120). Although sequence regions of the variouspolynucleotide strands are illustrated as small portions of strands, thesequence regions may comprise the entire segments from the end of astrand to genetic locus (100).

In some embodiments, to the target polynucleotides of the sample isadded a 3′ tailing activity along with nucleic acid monomers and/orother reaction components to implement tailing reaction (125) thatextends the 3′ ends with one or more A's. In this embodiment, theextension of predetermined nucleotides is shown as “A . . . A” toindicate that one or more nucleotides are added, but that the exactnumber added to each strand may be undetermined (unless anexo-polymerase is used, as noted below). The representation of the addednucleotide by “A . . . A” is not intended to limit the kind of addednucleotides to only A's. The added nucleotides are predetermined in thesense that the kind of nucleotide precursors used in a tailing reactionare known and selected as an assay design choice. For example, a factorin the selection of a kind of predetermined nucleotide for a particularembodiment may be the efficiency of the circularization step in view ofthe kind of nucleotide selected. In some embodiments, nucleotideprecursors may be nucleoside triphosphates of any of the fournucleotides, either separately, so that homopolymer tails are produced,or in mixtures, so that bi- or tri-nucleotide tails are produced. Insome cases, uracil, and/or nucleotide analogues may be used in additionto or in place of the four natural DNA bases. In some embodiments inwhich a CircLigase™ enzyme is used, predetermined nucleotides may be A'sand/or T's. In some embodiments, an exo-polymerase is used in a tailingreaction, and only a single deoxyadenylate is added to a 3′ end.

After tailing, and optional separation of the reaction products from thereaction mixture, individual strands are circularized, as shown in FIG.36B, using a circularization reaction to produce circles (132), eachcomprising a sequence element of the form “n_(j)−A . . . A−n_(j+1)”(133). After circularization, and optional separation of circles (132)from the reaction mixture, primers (134) are annealed to one or moreprimer binding sites of circles (132), after which they are extended toproduce concatemers each containing copies of their respective n_(j)−A .. . A−n_(j+1) sequence element, as illustrated in FIG. 36E. Aftersequencing, complementary strands, such as (136) and (138), may beidentified by matching sequence element components, n_(j) and n_(j+1),with their respective complements, n_(j)′ and n_(j+1)′. Selection ofprimer binding sites on circles (132) is a matter of design choice, oralternatively, random sequence primers may be used. In some embodiments,a single primer binding site is selected adjacent to genetic locus(100); in other embodiments, a plurality of primer binding sites areselected, each for a separate primer, to ensure amplification even if aboundary happens to occur in one of the primer binding sites. In someembodiments, two primers with separate primer binding sites are used toproduce concatemers.

After identification of pairs of concatemers containing complementarystrands, the concatemer sequences may be aligned and base calls atmatching positions of the two strands may be compared. At some positionsof concatemer pairs, as illustrated by (140) in FIG. 36F, a base calledat a given position in one member of a pair may not be complementary tothe base called on the other member of the pair, indicating that anincorrect call has been made due to, for example, amplification error,sequencing error, or the like. In this case, the indeterminacy at thegiven position may be resolve by examining the base calls atcorresponding positions of other copies within the concatemer pair. Forexample, a base call at the given position may be taken to be aconsensus, or a majority, of the base calls made for the individualcopies in a pair of concatemers. Other methods for making suchdeterminations would be available to one of ordinary skill in the art,which may be used in place of or in addition to these methods tosupplement efforts to resolve base calls when sequence informationbetween complementary strands are not complementary. In some cases,where bases at a specified position in complementary strands originatingfrom the same double-stranded molecule (e.g. as identified by the 3′ and5′ end sequences) are not complementary, a base call is resolved infavor of the reference sequence to which the sample sequence iscompared, such that the difference is not identified as a true sequencevariant with respect to such reference sequence.

In other circumstances, the same error may appear in each copy of atarget polynucleotide within a concatemer, as illustrated by (145) inFIG. 36G. Such data would suggest that the target polynucleotide wasdamaged before amplification or sequencing.

In still other circumstances, only a single concatemer may beidentified; that is, a concatemer for which no match is found based onboundary information, such as, length of the segment of predeterminednucleotides, sequences of adjacent 3′ and 5′ ends, or the like. Suchcircumstances are illustrated in FIGS. 37A and 37B. There, targetpolynucleotides (201) comprise single stranded polynucleotide 1 anddouble stranded polynucleotide 2, each encompassing genetic locus (200).Predetermined nucleotides (for example, adenylates) may be attached toboth polynucleotides 1 and 2 in tailing reaction (225) to form 3′ tailedpolynucleotides (220). As described above, polynucleotides (220) maythen be circularized, amplified by RCA, and sequenced to give concatemersequences (230), shown in FIG. 37B. In case an observed variant iscommon in DNA damage, for example, C to T or G to T, such informationfrom an unpaired concatemer will still be helpful in deciding if it is atrue mutation versus just a DNA damage.

In some embodiments, as illustrated in FIGS. 36C and 36D, primers eachcontaining a molecular tag, e.g. MT1 (150), MT2, and so on, may beannealed to each single stranded circle at predetermined primer bindingsites in order to produce concatemers each with a unique tag. Thepresence of unique molecular tags will distinguish products of singlestranded circles that happen to have the same boundary, or n_(j)−A . . .A−n_(j+1) sequence element. Such tags may also be used for countingmolecules to determine copy number variation at a genetic locus, forexample, in accordance with methods described in Brenner et al, U.S.Pat. No. 7,537,897, or the like, which is incorporated herein byreference. In some embodiments, primers with molecular tags may beselected that have binding sites only on one strand of a targetpolynucleotide so that concatemers with molecular tags represent onlyone of the two strands of a target polynucleotide. In other embodiments,circles from complementary strands of a target polynucleotide may eachbe amplified using a primer having a molecular tag (as illustrated inFIG. 36C).

In some embodiments, the above steps for identifying complementarystrands of target polynucleotides may be incorporated in a method fordetecting rare variants at a genetic locus. In some embodiments, themethod comprises the following steps: (a) extending by one or morepredetermined nucleotides 3′ ends of the polynucleotides; (b)circularizing individual strands of the polynucleotides to form singlestranded polynucleotide circles, the one or more predeterminednucleotides defining a boundary between 3′ sequences and 5′ sequences ofeach single stranded polynucleotide circle; (c) amplifying by rollingcircle replication (RCR) the single stranded polynucleotide circles toform concatemers; (d) sequencing the concatemers; (e) identifying pairsof concatemers containing complementary strands of polynucleotides bythe identity of 3′ sequences and 5′ sequences adjacent to the one ormore predetermined nucleotides; and (f) determining the sequence of thegenetic locus from the sequences of the pairs of concatemers comprisingcomplementary strands of the same polynucleotide. In other embodiments,the step of amplifying by RCR the single stranded circles includesannealing a primer having a 5′-noncomplementary tail to the singlestranded circles wherein such primer includes a unique molecular tag inthe 5′-noncomplementary tail and extending such primer in accordancewith an RCR protocol. The resulting product is a concatemer containing aunique molecular tag, which may be counted along with other moleculartags attached to circles from the same locus to provide a copy numbermeasurement for the locus.

In some embodiments, the step of extending may be implemented by tailingby one or more predetermined nucleotides 3′ ends of the polynucleotidesin a tailing reaction. In some embodiments, such tailing may beimplemented by an untemplated 3′ nucleotide addition activity, such as aTdT activity, an exo-polymerase activity, or the like.

Using the steps described above, concatemer sequences can be identifiedfrom polynucleotide sequences. In large-scale-parallel-sequencing (alsoreferred to as “next generation sequencing” or NGS), reads containingconcatemers can be identified and used to perform error correction andfind sequence variants. Junctions of the original input molecules (thestart and the end of the DNA/RNA sequence) can be reconstructed from theconcatemers by aligning them to reference sequences; and the junctionscan be used to identify the original input molecule and to removesequencing duplicates for more accurate counting. The strand identity ofeach read which may contain a concatemer can be computed by aligning thereads to reference sequences and checking the sequence elementcomponents, n_(j) and n_(j+1) as described in FIG. 36A. Variants foundin both concatemers labeled as complementary strands have a higherstatistical confidence level, which can be used to perform further errorcorrection. Variant confirmation using strand identity may be carriedout by (but is not limited to) the following steps: a) variants found inreads with complementary strand identities are considered moreconfident; b) reads carrying variants can be grouped by its junctionidentification, the variants are more confident when complementarystrand identities are found in reads within a group of reads having thesame junction identification; c) reads carrying variants can be groupedby their molecular barcodes or the combination of molecular barcodes andjunction identifications. The variants are more confident when thecomplement strand identities are found in reads within a group of readshaving the same molecular barcodes and/or junction identifications.

Error correction using molecular barcodes and junction identificationcan be used independently, or combined with the error correction withconcatemer sequencing as described in the previous steps. a) Reads withdifferent molecular barcodes (or junction identifications) can begrouped into different read families which represents reads originatedfrom different input molecules; b) consensus sequences can be built fromthe family of reads; c) consensus can be used for variant calling; d)molecular barcodes and junction identifications can be combined to forma composite ID for reads, which will help identify the original inputmolecules. In some embodiments, a base call (e.g. a sequence differencewith respect to a reference sequence) found in different read familiesare assigned a higher confidence. In some cases, a sequence differenceis only identified as a true sequence variant representative of theoriginal source polynucleotide (as opposed to an error of sampleprocessing or analysis) if the sequence difference passes one or morefilters that increase confidence of a base call, such as those describedabove. In some embodiments, a sequence difference is only identified asa true sequence variant if (a) it is identified on both strands of adouble-stranded input molecule; (b) it occurs in the consensus sequencefor the concatemer from which it originates (e.g. more than 50%, 80%,90% or more of the repeats within the concatemer contain the sequencedifference); and/or (c) it occurs in two different molecules (e.g. asidentified by different 3′ and 5′ endpoints, and/or by an exogenous tagsequence).

Determining strand identity: 1) junctions of the original inputmolecules can be reconstructed from reads which may contain concatemersequences by aligning the sequences to reference sequences; 2) thejunctions can be located in the reads using the alignments; 3) thesequence element component, n_(j) and n_(j+1), as described in FIG. 36A,which represents the strand identity, can be extracted from the sequencebased the junction locations in the reads; and in the case ofconcatemer, the sequence can be found between the junctions in theconcatemer sequences; 4) the strand (positive or negative) of thereference sequence that the reads align to, combined with the strandidentity sequences within the reads identified in step 3, can be used toidentify the original strand that was incorporated into the sequencelibrary and sequenced, and to identify which strand a sequence variantoriginated from. For example, suppose a strand identity sequence “AA” isadded to the end of a strand of original input DNA fragment; aftersequencing the read of the DNA fragment is aligned to the “+” strand ofthe reference and the strand identity sequence in the read is “AA”, weknow the original input strand is “+”; if the strand identity sequenceis “TT”, the read is reverse complementary to the original input strandand the original input strand is “−” strand. The strand identitydetermination allows a sequence variant to be distinguished from itsreverse complementary counterpart, for example, C>T substitution fromG>A substitution. The precise identification of allele changes can beused to carry out allele-specific error reduction in variant calling.For example, some DNA damage occurs more often as certain allelechanges, and allele-specific error reduction can be carried out tosuppress such damage; such error reduction can be done by variousstatistical methods, for example, 1) calculation of distribution ofdifferent allele changes in sequencing data (baseline), followed by 2)z-test or other statistical tests to determine if a observed allelechange is different from the baseline distribution.

In some embodiments, the present disclosure provides a method ofidentifying a genetic variant on a particular strand at a genetic locusby comparing the frequency of a measured sequence, or one or morenucleotides, to a baseline frequency of nucleotide damage that resultsin the same sequence, or one or more nucleotides, as the measuredsequence. In some embodiments, such a method may comprise the followingsteps: (a) extending by one or more predetermined nucleotides 3′ ends ofthe polynucleotides; (b) amplifying individual strands of the extendedpolynucleotides; (c) sequencing the amplified individual strands of theextended polynucleotides; (d) identifying complementary strands ofpolynucleotides by the identity of 3′ sequences and/or 5′ sequencesadjacent to the one or more predetermined nucleotides and identifyingnucleotides of each strand at the genetic locus; (e) determining afrequency of each of one or more nucleotides at the genetic locus fromthe identified concatemers for identifying the genetic variant. In someembodiments, this method may be used to distinguish a genetic variantfrom nucleotide damage by the following step: calling at least one ofsaid one or more nucleotides at said genetic locus on said strandidentified by said one or more predetermined nucleotides as said geneticvariant whenever said frequency of strands displaying the at least onenucleotide exceeds by a predetermined factor a baseline frequency ofstrands having nucleotide damage that gives rise to the same nucleotide.

As mentioned above, in some embodiments, the step of amplifying may becarried out by (i) circularizing individual strands of thepolynucleotides to form single stranded polynucleotide circles, the oneor more predetermined nucleotides defining a boundary between 3′sequences and 5′ sequences of the polynucleotides in each singlestranded polynucleotide circle; and (ii) amplifying by rolling circlereplication the single stranded polynucleotide circles to formconcatemers of the single stranded polynucleotide circles.

A baseline frequency of strands having nucleotide damage may be based onprior measurements on samples from the same individual who is beingtested by the method, or a baseline frequency may be based on priormeasurements on a population of individuals other than the individualbeing tested. A baseline frequency may also depend on and/or be specificfor the kind of steps or protocol used in preparing a sample foranalysis by a method of the disclosure. By comparing measuredfrequencies with baseline frequencies a statistical measure may beobtained of a likelihood (or confidence level) that a measured ordetermined sequence is a genuine genetic variant and not damage or errordue to processing.

Typically, the sequencing data is acquired from large scale, parallelsequencing reactions. Many of the next generation high-throughputsequencing systems export data as FASTQ files, although other formatsmay be used. In some embodiments, sequences are analyzed to identifyrepeat unit length (e.g. the monomer length), the junction formed bycircularization, and any true variation with respect to a referencesequence, typically through sequence alignment. Identifying the repeatunit length can include computing the regions of the repeated units,finding the reference loci of the sequences (e.g. when one or moresequences are particularly targeted for amplification, enrichment,and/or sequencing), the boundaries of each repeated region, and/or thenumber of repeats within each sequencing run. Sequence analysis caninclude analyzing sequence data for both strands of a duplex. As notedabove, in some embodiments, an identical variant that appears thesequences of reads from different polynucleotides from the sample (e.g.circularized polynucleotides having different junctions) is considered aconfirmed variant. In some embodiments, a sequence variant may also beconsidered a confirmed, or genuine, variant if it occurs in more thanone repeated unit of the same polynucleotide, as the same sequencevariation is likewise unlikely to occur at the same position in arepeated target sequence within the same concatemer. The quality scoreof a sequence may be considered in identifying variants and confirmedvariants, for example, the sequence and bases with quality scores lowerthan a threshold may be filtered out. Other bioinformatics methods canbe used to further increase the sensitivity and specificity of thevariant calls.

In some embodiments, statistical analyses may be applied todetermination of variants (mutations) and quantitate the ratio of thevariant in total DNA samples. Total measurement of a particular base canbe calculated using the sequencing data. For example, from the alignmentresults calculated in previous steps, one can calculate the number of“effective reads,” that is, number of confirmed reads for each locus.The allele frequency of a variant can be normalized by the effectiveread count for the locus. The overall noise level, that is the averagerate of observed variants across all loci, can be computed. Thefrequency of a variant and the overall noise level, combined with otherfactors, can be used to determine the confidence interval of the variantcall. Statistical models such as Poisson distributions can be used toassess the confidence interval of the variant calls. The allelefrequency of variants can also be used as an indicator of the relativequantity of the variant in the total sample.

In some embodiments, a microbial contaminant is identified based on thecalling step. For example, a particular sequence variant may indicatecontamination by a potentially infectious microbe. Sequence variants maybe identified within a highly conserved polynucleotide for the purposeof identifying a microbe. Exemplary highly conserved polynucleotidesuseful in the phylogenetic characterization and identification ofmicrobes comprise nucleotide sequences found in the 16S rRNA gene, 23SrRNA gene, 5S rRNA gene, 5.8S rRNA gene, 12S rRNA gene, 18S rRNA gene,28S rRNA gene, gyrB gene, rpoB gene, fusA gene, recA gene, coxl gene andnifD gene. With eukaryotes, the rRNA gene can be nuclear, mitochondrial,or both. In some embodiments, sequence variants in the 16S-23S rRNA geneinternal transcribed spacer (ITS) can be used for differentiation andidentification of closely related taxa with or without the use of otherrRNA genes. Due to structural constraints of 16S rRNA, specific regionsthroughout the gene have a highly conserved polynucleotide sequencealthough non-structural segments may have a high degree of variability.Identifying sequence variants can be used to identify operationaltaxonomic units (OTUs) that represent a subgenus, a genus, a subfamily,a family, a sub-order, an order, a sub-class, a class, a sub-phylum, aphylum, a sub-kingdom, or a kingdom, and optionally determine theirfrequency in a population. The detection of particular sequence variantscan be used in detecting the presence, and optionally amount (relativeor absolute), of a microbe indicative of contamination. Exampleapplications include water quality testing for fecal or othercontamination, testing for animal or human pathogens, pinpointingsources of water contamination, testing reclaimed or recycled water,testing sewage discharge streams including ocean discharge plumes,monitoring of aquaculture facilities for pathogens, monitoring beaches,swimming areas or other water related recreational facilities andpredicting toxic algal blooms. Food monitoring applications include theperiodic testing of production lines at food processing plants,surveying slaughter houses, inspecting the kitchens and food storageareas of restaurants, hospitals, schools, correctional facilities andother institutions for food borne pathogens such as E. coli strainsO157:H7 or O111:B4, Listeria monocytogenes, or Salmonella entericasubsp. enterica serovar Enteritidis. Shellfish and shellfish producingwaters can be surveyed for algae responsible for paralytic shellfishpoisoning, neurotoxic shellfish poisoning, diarrhetic shellfishpoisoning and amnesic shellfish poisoning. Additionally, importedfoodstuffs can be screened while in customs before release to ensurefood security. Plant pathogen monitoring applications includehorticulture and nursery monitoring for instance the monitoring forPhytophthora ramorum, the microorganism responsible for Sudden OakDeath, crop pathogen surveillance and disease management and forestrypathogen surveillance and disease management. Manufacturing environmentsfor pharmaceuticals, medical devices, and other consumables or criticalcomponents where microbial contamination is a major safety concern canbe surveyed for the presence of specific pathogens like Pseudomonasaeruginosa, or Staphylococcus aureus, the presence of more commonmicroorganisms associated with humans, microorganisms associated withthe presence of water or others that represent the bioburden that waspreviously identified in that particular environment or in similar ones.Similarly, the construction and assembly areas for sensitive equipmentincluding space craft can be monitored for previously identifiedmicroorganism that are known to inhabit or are most commonly introducedinto such environments.

In some embodiments, the method comprises identifying a sequence variantin a nucleic acid sample comprising less than 50 ng of polynucleotides,each polynucleotide having a 5′ end and a 3′ end. In some embodiments,the method comprises: (a) circularizing with a ligase individualpolynucleotides in said sample to form a plurality of circularpolynucleotides; (b) upon separating said ligase from said circularpolynucleotides, amplifying the circular polynucleotides to formconcatemers; (c) sequencing the concatemers to produce a plurality ofsequencing reads; (d) identifying sequence differences between theplurality of sequencing reads and a reference sequence; and (e) callinga sequence difference that occurs with a frequency of 0.05% or higher insaid plurality of reads from said nucleic acid sample of less than 50 ngpolynucleotides as the sequence variant.

The starting amount of polynucleotides in a sample may be small. In someembodiments, the amount of starting polynucleotides is less than 100 ng.In some embodiments, the amount of starting material is less than 75 ng.In some embodiments, the amount of starting material is less than 50 ng,such as less than 45 ng, 40 ng, 35 ng, 30 ng, 25 ng, 20 ng, 15 ng, 10ng, 5 ng, 4 ng, 3 ng, 2 ng, 1 ng, 0.5 ng, 0.1 ng, or less. In someembodiments, the amount of starting polynucleotides is in the range of0.1-100 ng, such as between 1-75 ng, 5-50 ng, or 10-20 ng. In general,lower starting material increases the importance of increased recoveryfrom various processing steps. Processes that reduce the amount ofpolynucleotides in a sample for participation in a subsequent reactiondecrease the sensitivity with which rare mutations can be detected. Forexample, methods described by Lou et al. (PNAS, 2013, 110 (49)) areexpected to recover only 10-20% of the starting material. For largeamounts of starting material (e.g. as purified from lab-culturedbacteria), this may not be a substantial obstacle. However, for sampleswhere the starting material is substantially lower, recovery in this lowrange can be a substantial obstacle to detection of sufficiently rarevariants. Accordingly, in some embodiments, sample recovery from onestep to another in a method of the disclosure (e.g. the mass fraction ofinput into a circularization step available for input into a subsequentamplification step or sequencing step) is about or more than about 50%,60%, 75%, 80%, 85%, 90%, 95%, or more. Recovery from a particular stepmay be close to 100%. Recovery may be with respect to a particular form,such as recovery of circular polynucleotides from an input ofnon-circular polynucleotides.

The polynucleotides may be from any suitable sample, such as a sampledescribed herein with respect to the various aspects of the disclosure.Polynucleotides from a sample may be any of a variety ofpolynucleotides, including but not limited to, DNA, RNA, ribosomal RNA(rRNA), transfer RNA (tRNA), micro RNA (miRNA), messenger RNA (mRNA),fragments of any of these, or combinations of any two or more of these.In some embodiments, samples comprise DNA. In some embodiments, thepolynucleotides are single-stranded, either as obtained or by way oftreatment (e.g. denaturation). Further examples of suitablepolynucleotides are described herein, such as with respect to any of thevarious aspects of the disclosure. In some embodiments, polynucleotidesare subjected to subsequent steps (e.g. circularization andamplification) without an extraction step, and/or without a purificationstep. For example, a fluid sample may be treated to remove cells withoutan extraction step to produce a purified liquid sample and a cellsample, followed by isolation of DNA from the purified fluid sample. Avariety of procedures for isolation of polynucleotides are available,such as by precipitation or non-specific binding to a substrate followedby washing the substrate to release bound polynucleotides. Wherepolynucleotides are isolated from a sample without a cellular extractionstep, polynucleotides will largely be extracellular or “cell-free”polynucleotides, such as cell-free DNA and cell-free RNA, which maycorrespond to dead or damaged cells. The identity of such cells may beused to characterize the cells or population of cells from which theyare derived, such as in a microbial community. If a sample is treated toextract polynucleotides, such as from cells in a sample, a variety ofextraction methods are available, examples of which are provided herein(e.g. with regard to any of the various aspects of the disclosure).

The sequence variant in the nucleic acid sample can be any of a varietyof sequence variants. Multiple non-limiting examples of sequencevariants are described herein, such as with respect to any of thevarious aspects of the disclosure. In some embodiments the sequencevariant is a single nucleotide polymorphism (SNP). In some embodiments,the sequence variant occurs with a low frequency in the population (alsoreferred to as a “rare” sequence variant). For example, the sequencevariant may occur with a frequency of about or less than about 5%, 4%,3%, 2%, 1.5%, 1%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%,0.02%, 0.01%, 0.005%, 0.001%, or lower. In some embodiments, thesequence variant occurs with a frequency of about or less than about0.1%.

According to some embodiments, polynucleotides of a sample arecircularized, such as by use of a ligase. Circularization can includejoining the 5′ end of a polynucleotide to the 3′ end of the samepolynucleotide, to the 3′ end of another polynucleotide in the sample,or to the 3′ end of a polynucleotide from a different source (e.g. anartificial polynucleotide, such as an oligonucleotide adapter). In someembodiments, the 5′ end of a polynucleotide is joined to the 3′ end ofthe same polynucleotide (also referred to as “self-joining”).Non-limiting examples of circularization processes (e.g. with andwithout adapter oligonucleotides), reagents (e.g. types of adapters, useof ligases), reaction conditions (e.g. favoring self-joining), andoptional additional processing (e.g. post-reaction purification) areprovided herein, such as with regard to any of the various aspects ofthe disclosure.

As previously described, joining ends of a polynucleotide to one-anotherto form a circular polynucleotide (either directly, or with one or moreintermediate adapter oligonucleotides) generally produces a junctionhaving a junction sequence. Where the 5′ end and 3′ end of apolynucleotide are joined via an adapter polynucleotide, the term“junction” can refer to a junction between the polynucleotide and theadapter (e.g. one of the 5′ end junction or the 3′ end junction), or tothe junction between the 5′ end and the 3′ end of the polynucleotide asformed by and including the adapter polynucleotide. Where the 5′ end andthe 3′ end of a polynucleotide are joined without an intervening adapter(e.g. the 5′ end and 3′ end of a single-stranded DNA), the term“junction” refers to the point at which these two ends are joined. Ajunction may be identified by the sequence of nucleotides comprising thejunction (also referred to as the “junction sequence”). In someembodiments, samples comprise polynucleotides having a mixture of endsformed by natural degradation processes (such as cell lysis, cell death,and other processes by which DNA is released from a cell to itssurrounding environment in which it may be further degraded, such as incell-free polynucleotides, such as cell-free DNA), fragmentation that isa byproduct of sample processing (such as fixing, staining, and/orstorage procedures), and fragmentation by methods that cleave DNAwithout restriction to specific target sequences (e.g. mechanicalfragmentation, such as by sonication; non-sequence specific nucleasetreatment, such as DNase I, fragmentase). Where samples comprisepolynucleotides having a mixture of ends, the likelihood that twopolynucleotides will have the same 5′ end or 3′ end is low, and thelikelihood that two polynucleotides will independently have both thesame 5′ end and 3′ end is extremely low. Accordingly, in someembodiments, junctions may be used to distinguish differentpolynucleotides, even where the two polynucleotides comprise a portionhaving the same target sequence. Where polynucleotide ends are joinedwithout an intervening adapter, a junction sequence may be identified byalignment to a reference sequence. For example, where the order of twocomponent sequences appears to be reversed with respect to the referencesequence, the point at which the reversal appears to occur may be anindication of a junction at that point. Where polynucleotide ends arejoined via one or more adapter sequences, a junction may be identifiedby proximity to the known adapter sequence, or by alignment as above ifa sequencing read is of sufficient length to obtain sequence from boththe 5′ and 3′ ends of the circularized polynucleotide. In someembodiments, the formation of a particular junction is a sufficientlyrare event such that it is unique among the circularized polynucleotidesof a sample.

After circularization, reaction products may be purified prior toamplification or sequencing to increase the relative concentration orpurity of circularized polynucleotides available for participating insubsequent steps (e.g. by isolation of circular polynucleotides orremoval of one or more other molecules in the reaction). For example, acircularization reaction or components thereof may be treated to removesingle-stranded (non-circularized) polynucleotides, such as by treatmentwith an exonuclease. As a further example, a circularization reaction orportion thereof may be subjected to size exclusion chromatography,whereby small reagents are retained and discarded (e.g. unreactedadapters), or circularization products are retained and released in aseparate volume. A variety of kits for cleaning up ligation reactionsare available, such as kits provided by Zymo oligo purification kitsmade by Zymo Research. In some embodiments, purification comprisestreatment to remove or degrade ligase used in the circularizationreaction, and/or to purify circularized polynucleotides away from suchligase. In some embodiments, treatment to degrade ligase comprisestreatment with a protease. Suitable proteases are available fromprokaryotes, viruses, and eukaryotes. Examples of proteases includeproteinase K (from Tritirachium album), pronase E (from Streptomycesgriseus), Bacillus polymyxa protease, theromolysin (from thermophilicbacteria), trypsin, subtilisin, furin, and the like. In someembodiments, the protease is proteinase K. Protease treatment may followmanufacturer protocols, or subjected to standard conditions (e.g. asprovided in Sambrook and Green, Molecular Cloning: A Laboratory Manual,4th Edition (2012)). Protease treatment may also be followed byextraction and precipitation. In one example, circularizedpolynucleotides are purified by proteinase K (Qiagen) treatment in thepresence of 0.1% SDS and 20 mM EDTA, extracted with 1:1phenol/chloroform and chloroform, and precipitated with ethanol orisopropanol. In some embodiments, precipitation is in ethanol.

As described with respect to other aspects of the disclosure,circularization may be followed directly by sequencing the circularizedpolynucleotides. Alternatively, sequencing may be preceded by one ormore amplification reactions. A variety of methods of amplifyingpolynucleotides (e.g. DNA and/or RNA) are available. Amplification maybe linear, exponential, or involve both linear and exponential phases ina multi-phase amplification process. Amplification methods may involvechanges in temperature, such as a heat denaturation step, or may beisothermal processes that do not require heat denaturation. Non-limitingexamples of suitable amplification processes are described herein, suchas with regard to any of the various aspects of the disclosure. In someembodiments, amplification comprises rolling circle amplification (RCA).As described elsewhere herein, a typical RCA reaction mixture comprisesone or more primers, a polymerase, and dNTPs, and produces concatemers.Typically, the polymerase in an RCA reaction is a polymerase havingstrand-displacement activity. A variety of such polymerases areavailable, non-limiting examples of which include exonuclease minus DNAPolymerase I large (Klenow) Fragment, Phi29 DNA polymerase, Taq DNAPolymerase and the like. In general, a concatemer is a polynucleotideamplification product comprising two or more copies of a target sequencefrom a template polynucleotide (e.g. about or more than about 2, 3, 4,5, 6, 7, 8, 9, 10, or more copies of the target sequence; in someembodiments, about or more than about 2 copies). Amplification primersmay be of any suitable length, such as about or at least about 5, 10,15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, or morenucleotides, any portion or all of which may be complementary to thecorresponding target sequence to which the primer hybridizes (e.g.about, or at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or morenucleotides). Examples of various RCA processes are described herein,such as the use of random primers, target-specific primers, andadapter-targeted primers, some of which are illustrated in FIGS. 7A-7C.

Where circularized polynucleotides are amplified prior to sequencing(e.g. to produce concatemers), amplified products may be subjected tosequencing directly without enrichment, or subsequent to one or moreenrichment steps. Non-limiting examples of suitable enrichment processesare described herein, such as with respect to any of the various aspectsof the disclosure (e.g. use of B2B primers for a second amplificationstep). According to some embodiments, circularized polynucleotides (oramplification products thereof, which may have optionally been enriched)are subjected to a sequencing reaction to generate sequencing reads.Sequencing reads produced by such methods may be used in accordance withother methods disclosed herein. A variety of sequencing methodologiesare available, particularly high-throughput sequencing methodologies.Examples include, without limitation, sequencing systems manufactured byIllumina (sequencing systems such as HiSeq® and MiSeq®), LifeTechnologies (Ion Torrent®, SOLiD®, etc.), Roche's 454 Life Sciencessystems, Pacific Biosciences systems, etc. In some embodiments,sequencing comprises use of HiSeq® and MiSeq® systems to produce readsof about or more than about 50, 75, 100, 125, 150, 175, 200, 250, 300,or more nucleotides in length. Additional non-limiting examples ofamplification platforms and methodologies are described herein, such aswith respect to any of the various aspects of the disclosure.

According to some embodiments, a sequence difference between sequencingreads and a reference sequence are called as a genuine sequence variant(e.g. existing in the sample prior to amplification or sequencing, andnot a result of either of these processes) if it occurs in at least twodifferent polynucleotides (e.g. two different circular polynucleotides,which can be distinguished as a result of having different junctions ortwo different polynucleotides having a different 5′ end and/or adifferent 3′ end). Because sequence variants that are the result ofamplification or sequencing errors are unlikely to be duplicated exactly(e.g. position and type) on two different polynucleotides comprising thesame target sequence, adding this validation parameter greatly reducesthe background of erroneous sequence variants, with a concurrentincrease in the sensitivity and accuracy of detecting actual sequencevariation in a sample. In some embodiments, a sequence variant having afrequency of about or less than about 5%, 4%, 3%, 2%, 1.5%, 1%, 0.75%,0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.04%, 0.03%, 0.02%, 0.01%, 0.005%,0.001%, or lower is sufficiently above background to permit an accuratecall. In some embodiments, the sequence variant occurs with a frequencyof about or less than about 0.1%. In some embodiments, the methodcomprises calling as a genuine sequence variant, those sequencedifferences having a frequency in the range of about 0.0005% to about3%, such as between 0.001%-2%, or 0.01%-1%. In some embodiments, thefrequency of a sequence variant is sufficiently above background whensuch frequency is statistically significantly above the background errorrate (e.g. with a p-value of about or less than about 0.05, 0.01, 0.001,0.0001, or lower). In some embodiments, the frequency of a sequencevariant is sufficiently above background when such frequency is about orat least about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold,9-fold, 10-fold, 25-fold, 50-fold, 100-fold, or more above thebackground error rate (e.g. at least 5-fold higher). In someembodiments, the background error rate in accurately determining thesequence at a given position is about or less than about 1%, 0.5%, 0.1%,0.05%, 0.01%, 0.005%, 0.001%, 0.0005%, or lower. In some embodiments,the error rate is lower than 0.001%. Methods for determining frequencyand error rate are described herein, such as with regard to any of thevarious aspects of the disclosure.

In some embodiments, identifying a genuine sequence variant (alsoreferred to as “calling” or “making a call”) comprises optimallyaligning one or more sequencing reads with a reference sequence toidentify differences between the two, as well as to identify junctions.In general, alignment involves placing one sequence along anothersequence, iteratively introducing gaps along each sequence, scoring howwell the two sequences match, and preferably repeating for variouspositions along the reference. The best-scoring match is deemed to bethe alignment and represents an inference about the degree ofrelationship between the sequences. A variety of alignment algorithmsand aligners implementing them are available, non-limiting examples ofwhich are described herein, such as with respect to any of the variousaspects of the disclosure. In some embodiments, a reference sequence towhich sequencing reads are compared is a known reference sequence, suchas a reference genome (e.g. the genome of a member of the same speciesas the subject). A reference genome may be complete or incomplete. Insome embodiments, a reference genome consists only of regions containingtarget polynucleotides, such as from a reference genome or from aconsensus generated from sequencing reads under analysis. In someembodiments, a reference sequence comprises or consists of sequences ofpolynucleotides of one or more organisms, such as sequences from one ormore bacteria, archaea, viruses, protists, fungi, or other organism. Insome embodiments, the reference sequence consists of only a portion of areference genome, such as regions corresponding to one or more targetsequences under analysis (e.g. one or more genes, or portions thereof).For example, for detection of a pathogen (such as in the case ofcontamination detection), the reference genome is the entire genome ofthe pathogen (e.g. HIV, HPV, or a harmful bacterial strain, e.g. E.coli), or a portion thereof useful in identification, such as of aparticular strain or serotype. In some embodiments, sequencing reads arealigned to multiple different reference sequences, such as to screen formultiple different organisms or strains. Additional non-limitingexamples of reference sequences with respect to which sequencedifferences may be identified (and sequence variants called) aredescribed herein, such as with respect to any of the various aspects ofthe disclosure.

In one aspect, the disclosure provides a method of amplifying in areaction mixture a plurality of different concatemers comprising two ormore copies of a target sequence, wherein the target sequence comprisessequence A and sequence B oriented in a 5′ to 3′ direction. In someembodiments, the method comprises subjecting the reaction mixture to anucleic acid amplification reaction, wherein the reaction mixturecomprises: (a) the plurality of concatemers, wherein individualconcatemers in the plurality comprise different junctions formed bycircularizing individual polynucleotides having a 5′ end and a 3′ end;(b) a first primer comprising sequence A′, wherein the first primerspecifically hybridizes to sequence A of the target sequence viasequence complementarity between sequence A and sequence A′; (c) asecond primer comprising sequence B, wherein the second primerspecifically hybridizes to sequence B′ present in a complementarypolynucleotide comprising a complement of the target sequence viasequence complementarity between sequence B and B′; and (d) a polymerasethat extends the first primer and the second primer to produce amplifiedpolynucleotides; wherein the distance between the 5′ end of sequence Aand the 3′ end of sequence B of the target sequence is 75 nt or less.

In a related aspect, the disclosure provides a method of amplifying in areaction mixture a plurality of different circular polynucleotidescomprising a target sequence, wherein the target sequence comprisessequence A and sequence B oriented in a 5′ to 3′ direction. In someembodiments, the method comprises subjecting the reaction mixture to anucleic acid amplification reaction, wherein the reaction mixturecomprises: (a) the plurality of circular polynucleotides, whereinindividual circular polynucleotides in the plurality comprise differentjunctions formed by circularizing individual polynucleotides having a 5′end and a 3′ end; (b) a first primer comprising sequence A′, wherein thefirst primer specifically hybridizes to sequence A of the targetsequence via sequence complementarity between sequence A and sequenceA′; (c) a second primer comprising sequence B, wherein the second primerspecifically hybridizes to sequence B′ present in a complementarypolynucleotide comprising a complement of the target sequence viasequence complementarity between sequence B and B′; and (d) a polymerasethat extends the first primer and the second primer to produce amplifiedpolynucleotides; wherein sequence A and sequence B are endogenoussequences, and the distance between the 5′ end of sequence A and the 3′end of sequence B of the target sequence is 75 nt or less.

Whether amplifying circular polynucleotides or concatemers, suchpolynucleotides may be from any suitable sample sources (eitherdirectly, or indirectly, such as by amplification). A variety ofsuitable sample sources, optional extraction processes, types ofpolynucleotides, and types of sequence variants are described herein,such as with respect to any of the various aspects of the disclosure.Circular polynucleotides may be derived from circularizing non-circularpolynucleotides. Non-limiting examples of circularization processes(e.g. with and without adapter oligonucleotides), reagents (e.g. typesof adapters, use of ligases), reaction conditions (e.g. favoringself-joining), optional additional processing (e.g. post-reactionpurification), and the junctions formed thereby are provided herein,such as with regard to any of the various aspects of the disclosure.Concatemers may be derived from amplification of circularpolynucleotides. A variety of methods of amplifying polynucleotides(e.g. DNA and/or RNA) are available, non-limiting examples of which havealso been described herein. In some embodiments, concatemers aregenerated by rolling circle amplification of circular polynucleotides.

FIG. 10 illustrates an example arrangement of the first and secondprimer with respect to a target sequence in the context of a singlerepeat (which will typically not be amplified unless circular) andconcatemers comprising multiple copies of the target sequence. As notedwith regard to other aspects described herein, this arrangement ofprimers may be referred to as “back to back” (B2B) or “inverted”primers. Amplification with B2B primers facilitates enrichment ofcircular and/or concatemeric templates. Moreover, this orientationcombined with a relatively smaller footprint (total distance spanned bya pair of primers) permits amplification of a wider variety offragmentation events around a target sequence, as a junction is lesslikely to occur between primers than in the arrangement of primers foundin a typical amplification reaction (facing one another, spanning atarget sequence). In some embodiments, the distance between the 5′ endof sequence A and the 3′ end of sequence B is about or less than about200, 150, 100, 75, 50, 40, 30, 25, 20, 15, or fewer nucleotides. In someembodiments, sequence A is the complement of sequence B. In someembodiments, multiple pairs of B2B primers directed to a plurality ofdifferent target sequences are used in the same reaction to amplify aplurality of different target sequences in parallel (e.g. about or atleast about 10, 50, 100, 150, 200, 250, 300, 400, 500, 1000, 2500, 5000,10000, 15000, or more different target sequences). Primers can be of anysuitable length, such as described elsewhere herein. Amplification maycomprise any suitable amplification reaction under appropriateconditions, such as an amplification reaction described herein. In someembodiments, amplification is a polymerase chain reaction.

In some embodiments, B2B primers comprise at least two sequenceelements, a first element that hybridizes to a target sequence viasequence complementarity, and a 5′ “tail” that does not hybridize to thetarget sequence during a first amplification phase at a firsthybridization temperature during which the first element hybridizes(e.g. due to lack of sequence complementarity between the tail and theportion of the target sequence immediately 3′ with respect to where thefirst element binds). For example, the first primer comprises sequence C5′ with respect to sequence A′, the second primer comprises sequence D5′ with respect to sequence B, and neither sequence C nor sequence Dhybridize to the plurality of concatemers (or circular polynucleotides)during a first amplification phase at a first hybridization temperature.In some embodiments in which such tailed primers are used, amplificationcan comprise a first phase and a second phase; the first phase comprisesa hybridization step at a first temperature, during which the first andsecond primers hybridize to the concatemers (or circularpolynucleotides) and primer extension; and the second phase comprises ahybridization step at a second temperature that is higher than the firsttemperature, during which the first and second primers hybridize toamplification products comprising extended first or second primers, orcomplements thereof, and primer extension. The number of amplificationcycles at each of the two temperatures can be adjusted based on theproducts desired. Typically, the first temperature will be used for arelatively low number of cycles, such as about or less than about 15,10, 9, 8, 7, 6, 5, or fewer cycles. The number of cycles at the highertemperature can be selected independently of the number of cycles at thefirst temperature, but will typically be as many or more cycles, such asabout or at least about 5, 6, 7, 8, 9, 10, 15, 20, 25, or more cycles.The higher temperature favors hybridization between the first elementand tail element of the primer in primer extension products over shorterfragments formed by hybridization between only the first element in aprimer and an internal target sequence within a concatemer. Accordingly,the two-phase amplification may be used to reduce the extent to whichshort amplification products might otherwise be favored, therebymaintaining a relatively higher proportion of amplification productshaving two or more copies of a target sequence. For example, after 5cycles (e.g. at least 5, 6, 7, 8, 9, 10, 15, 20, or more cycles) ofhybridization at the second temperature and primer extension, at least5% (e.g. at least 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, or more)of amplified polynucleotides in the reaction mixture comprise two ormore copies of the target sequence. An illustration of an embodiment inaccordance with this two-phase, tailed B2B primer amplification processis illustrated in FIGS. 11A-11D. Illustration of a furtherimplementation is provided in FIGS. 15A-15C.

In some embodiments, amplification is under conditions that are skewedto increase the length of amplicons from concatemers. For example, theprimer concentration can be lowered, such that not every priming sitewill hybridize a primer, thus making the PCR products longer. Similarly,decreasing the primer hybridization time during the cycles willsimilarly allow fewer primers to hybridize, thus also making the averagePCR amplicon size increase. Furthermore, increasing the temperatureand/or extension time of the cycles may similarly increase the averagelength of the PCR amplicons. Any combination of these techniques can beused.

In some embodiments, particularly where an amplification with B2Bprimers has been performed, amplification products are treated to filterthe resulting amplicons on the basis of size to reduce and/or eliminatethe number of monomers a mixture comprising concatemers. This can bedone using a variety of available techniques, including, but not limitedto, fragment excision from gels and gel filtration (e.g. to enrich forfragments larger than about 300, 400, 500, or more nucleotides inlength); as well as SPRI beads (Agencourt AMPure XP) for size selectionby fine-tuning the binding buffer concentration. For example, the use of0.6× binding buffer during mixing with DNA fragments may be used topreferentially bind DNA fragments larger than about 500 base pairs (bp).

In some embodiments, the first primer comprises sequence C 5′ withrespect to sequence A′, the second primer comprises sequence D 5′ withrespect to sequence B, and neither sequence C nor sequence D hybridizeto the plurality of circular polynucleotides during a firstamplification phase at a first hybridization temperature. Amplificationmay comprise a first phase and a second phase; wherein the first phasecomprises a hybridization step at a first temperature, during which thefirst and second primer hybridize to the circular polynucleotides oramplification products thereof prior to primer extension; and the secondphase comprises a hybridization step at a second temperature that ishigher than the first temperature, during which the first and secondprimers hybridize to amplification products comprising extended first orsecond primers or complements thereof. For example, the firsttemperature may be selected as about or more than about the Tm ofsequence A′, sequence B, or the average of these, or a temperature thatis greater than 1° C., 2° C., 3° C., 4° C., 5° C., 6° C., 7° C., 8° C.,9° C., 10° C., or higher than one of these Tm's. In this example, thesecond temperature may be selected to be about or more than about the Tmof the combined sequence (A′+C), the combine sequence (B+D), or theaverage of these, or a temperature that is greater than 1° C., 2° C., 3°C., 4° C., 5° C., 6° C., 7° C., 8° C., 9° C., 10° C., or higher than oneof these Tm's. The term “Tm” is also referred to as the “meltingtemperature,” and generally represents the temperature at which 50% ofan oligonucleotide consisting of a reference sequence (which may in factbe a sub-sequence within a larger polynucleotide) and its complementarysequence are hybridized (or separated). In general, Tm increases withincreasing length, and as such, the Tm of sequence A′ is expected to belower than the Tm of combination sequence (A′+C).

In one aspect, the disclosure provides a system for detecting a sequencevariant. In some embodiments, the system comprises (a) a computerconfigured to receive a user request to perform a detection reaction ona sample; (b) an amplification system that performs a nucleic acidamplification reaction on the sample or a portion thereof in response tothe user request, wherein the amplification reaction comprises the stepsof (i) circularizing individual polynucleotides in a plurality ofpolynucleotides to form a plurality of circular polynucleotides using aligase enzyme, each polynucleotide of the plurality having a junctionbetween the 5′ end and 3′ end prior to ligation; (ii) degrading theligase enzyme; and (ii) amplifying the circular polynucleotides afterdegrading the ligase enzyme to produce amplified polynucleotides;wherein polynucleotides are not purified or isolated between steps (i)and (iii); (c) a sequencing system that generates sequencing reads forpolynucleotides amplified by the amplification system, identifiessequence differences between sequencing reads and a reference sequence,and calls a sequence difference that occurs in at least two circularpolynucleotides having different junctions as the sequence variant; and(d) a report generator that sends a report to a recipient, wherein thereport contains results for detection of the sequence variant. In someembodiments, the recipient is the user. FIG. 32 illustrates anon-limiting example of a system useful in the methods of the presentdisclosure. FIGS. 29 and 30 provide illustrative schematics of exemplaryworkflow design.

A computer for use in the system can comprise one or more processors.Processors may be associated with one or more controllers, calculationunits, and/or other units of a computer system, or implanted in firmwareas desired. If implemented in software, the routines may be stored inany computer readable memory such as in RAM, ROM, flash memory, amagnetic disk, a laser disk, or other suitable storage medium. Likewise,this software may be delivered to a computing device via any knowndelivery method including, for example, over a communication channelsuch as a telephone line, the internet, a wireless connection, etc., orvia a transportable medium, such as a computer readable disk, flashdrive, etc. The various steps may be implemented as various blocks,operations, tools, modules and techniques which, in turn, may beimplemented in hardware, firmware, software, or any combination ofhardware, firmware, and/or software. When implemented in hardware, someor all of the blocks, operations, techniques, etc. may be implementedin, for example, a custom integrated circuit (IC), an applicationspecific integrated circuit (ASIC), a field programmable logic array(FPGA), a programmable logic array (PLA), etc. A client-server,relational database architecture can be used in embodiments of thesystem. A client-server architecture is a network architecture in whicheach computer or process on the network is either a client or a server.Server computers are typically powerful computers dedicated to managingdisk drives (file servers), printers (print servers), or network traffic(network servers). Client computers include PCs (personal computers) orworkstations on which users run applications, as well as example outputdevices as disclosed herein. Client computers rely on server computersfor resources, such as files, devices, and even processing power. Insome embodiments, the server computer handles all of the databasefunctionality. The client computer can have software that handles allthe front-end data management and can also receive data input fromusers.

The system can be configured to receive a user request to perform adetection reaction on a sample. The user request may be direct orindirect. Examples of direct request include those transmitted by way ofan input device, such as a keyboard, mouse, or touch screen. Examples ofindirect requests include transmission via a communication medium, suchas over the internet (either wired or wireless).

The system can further comprise an amplification system that performs anucleic acid amplification reaction on the sample or a portion thereofin response to the user request. A variety of methods of amplifyingpolynucleotides (e.g. DNA and/or RNA) are available. Amplification maybe linear, exponential, or involve both linear and exponential phases ina multi-phase amplification process. Amplification methods may involvechanges in temperature, such as a heat denaturation step, or may beisothermal processes that do not require heat denaturation. Non-limitingexamples of suitable amplification processes are described herein, suchas with regard to any of the various aspects of the disclosure. In someembodiments, amplification comprises rolling circle amplification (RCA).A variety of systems for amplifying polynucleotides are available, andmay vary based on the type of amplification reaction to be performed.For example, for amplification methods that comprise cycles oftemperature changes, the amplification system may comprise athermocycler. An amplification system can comprise a real-timeamplification and detection instrument, such as systems manufactured byApplied Biosystems, Roche, and Strategene. In some embodiments, theamplification reaction comprises the steps of (i) circularizingindividual polynucleotides to form a plurality of circularpolynucleotides, each of which having a junction between the 5′ end and3′ end; and (ii) amplifying the circular polynucleotides. Samples,polynucleotides, primers, polymerases, and other reagents can be any ofthose described herein, such as with regard to any of the variousaspects. Non-limiting examples of circularization processes (e.g. withand without adapter oligonucleotides), reagents (e.g. types of adapters,use of ligases), reaction conditions (e.g. favoring self-joining),optional additional processing (e.g. post-reaction purification), andthe junctions formed thereby are provided herein, such as with regard toany of the various aspects of the disclosure. Systems can be selectedand or designed to execute any such methods.

Systems may further comprise a sequencing system that generatessequencing reads for polynucleotides amplified by the amplificationsystem, identifies sequence differences between sequencing reads and areference sequence, and calls a sequence difference that occurs in atleast two circular polynucleotides having different junctions as thesequence variant. The sequencing system and the amplification system maybe the same, or comprise overlapping equipment. For example, both theamplification system and sequencing system may utilize the samethermocycler. A variety of sequencing platforms for use in the systemare available, and may be selected based on the selected sequencingmethod. Examples of sequencing methods are described herein.Amplification and sequencing may involve the use of liquid handlers.Several commercially available liquid handling systems can be utilizedto run the automation of these processes (see for example liquidhandlers from Perkin-Elmer, Beckman Coulter, Caliper Life Sciences,Tecan, Eppendorf, Apricot Design, Velocity 11 as examples). A variety ofautomated sequencing machines are commercially available, and includesequencers manufactured by Life Technologies (SOLiD platform, andpH-based detection), Roche (454 platform), Illumina (e.g. flow cellbased systems, such as Genome Analyzer devices). Transfer between 2, 3,4, 5, or more automated devices (e.g. between one or more of a liquidhandler and a sequencing device) may be manual or automated.

Methods for identifying sequence differences and calling sequencevariants with respect to a reference sequence are described herein, suchas with regard to any of the various aspects of the disclosure. Thesequencing system will typically comprise software for performing thesesteps in response to an input of sequencing data and input of desiredparameters (e.g. selection of a reference genome). Examples of alignmentalgorithms and aligners implementing these algorithms are describedherein, including but not limited to the Needleman-Wunsch algorithm (seee.g. the EMBOSS Needle aligner available atwww.ebi.ac.uk/Tools/psa/emboss needle/nucleotide.html, optionally withdefault settings), the BLAST algorithm (see e.g. the BLAST alignmenttool available at blast.ncbi.nlm.nih.gov/Blast.cgi, optionally withdefault settings), or the Smith-Waterman algorithm (see e.g. the EMBOSSWater aligner available at www.ebi.ac.uk/Tools/psa embosswater/nucleotide.html, optionally with default settings). Optimalalignment may be assessed using any suitable parameters of a chosenalgorithm, including default parameters. Such alignment algorithms mayform part of the sequencing system.

The system can further comprise a report generator that sends a reportto a recipient, wherein the report contains results for detection of thesequence variant. A report may be generated in real-time, such as duringa sequencing read or while sequencing data is being analyzed, withperiodic updates as the process progresses. In addition, oralternatively, a report may be generated at the conclusion of theanalysis. The report may be generated automatically, such when thesequencing system completes the step of calling all sequence variants.In some embodiments, the report is generated in response to instructionsfrom a user. In addition to the results of detection of the sequencevariant, a report may also contain an analysis based on the one or moresequence variants. For example, where one or more sequence variants areassociated with a particular contaminant or phenotype, the report mayinclude information concerning this association, such as a likelihoodthat the contaminant or phenotype is present, at what level, andoptionally a suggestion based on this information (e.g. additionaltests, monitoring, or remedial measures). The report can take any of avariety of forms. It is envisioned that data relating to the presentdisclosure can be transmitted over such networks or connections (or anyother suitable means for transmitting information, including but notlimited to mailing a physical report, such as a print-out) for receptionand/or for review by a receiver. The receiver can be but is not limitedto an individual, or electronic system (e.g. one or more computers,and/or one or more servers).

In one aspect, the disclosure provides a computer-readable mediumcomprising codes that, upon execution by one or more processors,implement a method of detecting a sequence variant. In some embodiments,the implemented method comprises: (a) receiving a customer request toperform a detection reaction on a sample; (b) performing a nucleic acidamplification reaction on the sample or a portion thereof in response tothe customer request, wherein the amplification reaction comprises thesteps of (i) circularizing individual polynucleotides in a plurality ofpolynucleotides to form a plurality of circular polynucleotides using aligase enzyme, wherein each polynucleotide of the plurality ofpolynucleotides has a 5′ end and 3′ end prior to ligation; (ii)degrading the ligase enzyme; and (ii) amplifying the circularpolynucleotides after degrading the ligase enzyme to produce amplifiedpolynucleotides; wherein polynucleotides are not purified or isolatedbetween steps (i) and (iii); (c) performing a sequencing analysiscomprising the steps of (i) generating sequencing reads forpolynucleotides amplified in the amplification reaction; (ii)identifying sequence differences between sequencing reads and areference sequence; and (iii) calling a sequence difference that occursin at least two circular polynucleotides having different junctions asthe sequence variant; and (d) generating a report that contains resultsfor detection of the sequence variant.

A machine readable medium comprising computer-executable code may takemany forms, including but not limited to, a tangible storage medium, acarrier wave medium or physical transmission medium. Non-volatilestorage media include, for example, optical or magnetic disks, such asany of the storage devices in any computers) or the like, such as may beused to implement the databases, etc. Volatile storage media includedynamic memory, such as main memory of such a computer platform.Tangible transmission media include coaxial cables; copper wire andfiber optics, including the wires that comprise a bus within a computersystem. Carrier-wave transmission media may take the form of electric orelectromagnetic signals, or acoustic or light waves such as thosegenerated during radio frequency (RF) and infrared (IR) datacommunications. Common forms of computer-readable media thereforeinclude for example: a floppy disk, a flexible disk, hard disk, magnetictape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any otheroptical medium, punch cards paper tape, any other physical storagemedium with patterns of holes, a RAM, a ROM, a PROM and EPROM, aFLASH-EPROM, any other memory chip or cartridge, a carrier wavetransporting data or instructions, cables or links transporting such acarrier wave, or any other medium from which a computer may readprogramming code and/or data. Many of these forms of computer readablemedia may be involved in carrying one or more sequences of one or moreinstructions to a processor for execution.

The subject computer-executable code can be executed on any suitabledevice comprising a processor, including a server, a PC, or a mobiledevice such as a smartphone or tablet. Any controller or computeroptionally includes a monitor, which can be a cathode ray tube (“CRT”)display, a flat panel display (e.g., active matrix liquid crystaldisplay, liquid crystal display, etc.), or others. Computer circuitry isoften placed in a box, which includes numerous integrated circuit chips,such as a microprocessor, memory, interface circuits, and others. Thebox also optionally includes a hard disk drive, a floppy disk drive, ahigh capacity removable drive such as a writeable CD-ROM, and othercommon peripheral elements. Inputting devices such as a keyboard, mouse,or touch-sensitive screen, optionally provide for input from a user. Thecomputer can include appropriate software for receiving userinstructions, either in the form of user input into a set of parameterfields, e.g., in a GUI, or in the form of preprogrammed instructions,e.g., preprogrammed for a variety of different specific operations.

In some embodiments of any of the various aspects disclosed herein, themethods, compositions, and systems have therapeutic applications, suchas in the characterization of a patient sample and optionally diagnosisof a condition of a subject. Therapeutic applications may also includeinforming the selection of therapies to which a patient may be mostresponsive (also referred to as “theranostics”), and actual treatment ofa subject in need thereof, based on the results of a method describedherein. In particular, methods and compositions disclosed herein may beused to diagnose tumor presence, progression and/or metastasis oftumors, especially when the polynucleotides analyzed comprise or consistof cfDNA, ctDNA, cfRNA, or fragmented tumor DNA. In some embodiments, asubject is monitored for treatment efficacy. For example, by monitoringctDNA over time, a decrease in ctDNA can be used as an indication ofefficacious treatment, while increases can facilitate selection ofdifferent treatments or different dosages. Other uses includeevaluations of organ rejection in transplant recipients (where increasesin the amount of circulating DNA corresponding to the transplant donorgenome is used as an early indicator of transplant rejection), andgenotyping/isotyping of pathogen infections, such as viral or bacterialinfections. Detection of sequence variants in circulating fetal DNA maybe used to diagnose a condition of a fetus.

As used herein, “treatment” or “treating,” or “palliating” or“ameliorating” are used interchangeably. These terms refer to anapproach for obtaining beneficial or desired results including but notlimited to a therapeutic benefit and/or a prophylactic benefit. Bytherapeutic benefit is meant any therapeutically relevant improvement inor effect on one or more diseases, conditions, or symptoms undertreatment. For prophylactic benefit, the compositions may beadministered to a subject at risk of developing a particular disease,condition, or symptom, or to a subject reporting one or more of thephysiological symptoms of a disease, even though the disease, condition,or symptom may not have yet been manifested. Typically, prophylacticbenefit includes reducing the incidence and/or worsening of one or morediseases, conditions, or symptoms under treatment (e.g. as betweentreated and untreated populations, or between treated and untreatedstates of a subject). Improving a treatment outcome may includediagnosing a condition of a subject in order to identify the subject asone that will or will not benefit from treatment with one or moretherapeutic agents, or other therapeutic intervention (such as surgery).In such diagnostic applications, the overall rate of successfultreatment with the one or more therapeutic agents may be improved,relative to its effectiveness among patients grouped without diagnosisaccording to a method of the present disclosure (e.g. an improvement ina measure of therapeutic efficacy by at least about 10%, 20%, 30%, 40%,50%, 60%, 70%, 80%, 90%, 95%, or more).

The terms “subject,” “individual,” and “patient” are usedinterchangeably herein to refer to a vertebrate, preferably a mammal,more preferably a human. Mammals include, but are not limited to,murines, simians, humans, farm animals, sport animals, and pets.Tissues, cells, and their progeny of a biological entity obtained invivo or cultured in vitro are also encompassed.

The terms “therapeutic agent”, “therapeutic capable agent” or “treatmentagent” are used interchangeably and refer to a molecule or compound thatconfers some beneficial effect upon administration to a subject. Thebeneficial effect includes enablement of diagnostic determinations;amelioration of a disease, symptom, disorder, or pathological condition;reducing or preventing the onset of a disease, symptom, disorder orcondition; and generally counteracting a disease, symptom, disorder orpathological condition.

In some embodiments of the various methods described herein, the sampleis from a subject. A subject can be any organism, non-limiting examplesof which include plants, animals, fungi, protists, monerans, viruses,mitochondria, and chloroplasts. Sample polynucleotides can be isolatedfrom a subject, such as a cell sample, tissue sample, bodily fluidsample, or organ sample (or cell cultures derived from any of these),including, for example, cultured cell lines, biopsy, blood sample, cheekswab, or fluid sample containing a cell (e.g. saliva). In some cases,the sample does not comprise intact cells, is treated to remove cells,or polynucleotides are isolated without a cellular extractions step(e.g. to isolate cell-free polynucleotides, such as cell-free DNA).Other examples of sample sources include those from blood, urine, feces,nares, the lungs, the gut, other bodily fluids or excretions, materialsderived therefrom, or combinations thereof. The subject may be ananimal, including but not limited to, a cow, a pig, a mouse, a rat, achicken, a cat, a dog, etc., and is usually a mammal, such as a human.In some embodiments, the sample comprises tumor cells, such as in asample of tumor tissue from a subject. In some embodiments, the sampleis a blood sample or a portion thereof (e.g. blood plasma or serum).Serum and plasma may be of particular interest, due to the relativeenrichment for tumor DNA associated with the higher rate of malignantcell death among such tissues. A sample may be a fresh sample, or asample subjected to one or more storage processes (e.g.paraffin-embedded samples, particularly formalin-fixed paraffin-embedded(FFPE) sample). In some embodiments, a sample from a single individualis divided into multiple separate samples (e.g. 2, 3, 4, 5, 6, 7, 8, 9,10, or more separate samples) that are subjected to methods of thedisclosure independently, such as analysis in duplicate, triplicate,quadruplicate, or more. Where a sample is from a subject, the referencesequence may also be derived from the subject, such as a consensussequence from the sample under analysis or the sequence ofpolynucleotides from another sample or tissue of the same subject. Forexample, a blood sample may be analyzed for ctDNA mutations, whilecellular DNA from another sample (e.g. buccal or skin sample) isanalyzed to determine the reference sequence.

Polynucleotides may be extracted from a sample, with or withoutextraction from cells in a sample, according to any suitable method. Avariety of kits are available for extraction of polynucleotides,selection of which may depend on the type of sample, or the type ofnucleic acid to be isolated. Examples of extraction methods are providedherein, such as those described with respect to any of the variousaspects disclosed herein. In one example, the sample may be a bloodsample, such as a sample collected in an EDTA tube (e.g. BD Vacutainer).Plasma can be separated from the peripheral blood cells bycentrifugation (e.g. 10 minutes at 1900×g at 4° C.). Plasma separationperformed in this way on a 6 mL blood sample will typically yield 2.5 to3 mL of plasma. Circulating cell-free DNA can be extracted from a plasmasample, such as by using a QIAmp Circulating Nucleic Acid Kit (Qiagene),according the manufacturer's protocol. DNA may then be quantified (e.g.on an Agilent 2100 Bioanalyzer with High Sensitivity DNA kit (Agilent)).As an example, yield of circulating DNA from such a plasma sample from ahealthy person may range from 1 ng to 10 ng per mL of plasma, withsignificantly more in cancer patient samples.

Polynucleotides can also be derived from stored samples, such frozen orarchived samples. One common method for storing samples is toformalin-fix and paraffin-embed them. However, this process is alsoassociated with degradation of nucleic acids. Polynucleotides processedand analyzed from an FFPE sample may include short polynucleotides, suchas fragments in the range of 50-200 base pairs, or shorter. A number oftechniques exist for the purification of nucleic acids from fixedparaffin-embedded samples, such as those described in WO2007133703, andmethods described by Foss, et al Diagnostic Molecular Pathology, (1994)3:148-155 and Paska, C., et al Diagnostic Molecular Pathology, (2004)13:234-240. Commercially available kits may be used for purifyingpolynucleotides from FFPE samples, such as Ambion's Recoverall TotalNucleic acid Isolation kit. Typical methods start with a step thatremoves the paraffin from the tissue via extraction with Xylene or otherorganic solvent, followed by treatment with heat and a protease likeproteinase K which cleaves the tissue and proteins and helps to releasethe genomic material from the tissue. The released nucleic acids canthen be captured on a membrane or precipitated from solution, washed toremoved impurities and for the case of mRNA isolation, a DNase treatmentstep is sometimes added to degrade unwanted DNA. Other methods forextracting FFPE DNA are available and can be used in the methods of thepresent disclosure.

In some embodiments, the plurality of polynucleotides comprise cell-freepolynucleotides, such as cell-free DNA (cfDNA), cell-free RNA (cfRNA),circulating tumor DNA (ctDNA), or circulating tumor RNA (ctRNA).Cell-free DNA circulates in both healthy and diseased individuals.Cell-free RNA circulates in both healthy and diseased individuals. cfDNAfrom tumors (ctDNA) is not confined to any specific cancer type, butappears to be a common finding across different malignancies. Accordingto some measurements, the free circulating DNA concentration in plasmais about 14-18 ng/ml in control subjects and about 180-318 ng/ml inpatients with neoplasias. Apoptotic and necrotic cell death contributeto cell-free circulating DNA in bodily fluids. For example,significantly increased circulating DNA levels have been observed inplasma of prostate cancer patients and other prostate diseases, such asBenign Prostate Hyperplasia and Prostatitis. In addition, circulatingtumor DNA is present in fluids originating from the organs where theprimary tumor occurs. Thus, breast cancer detection can be achieved inductal lavages; colorectal cancer detection in stool; lung cancerdetection in sputum, and prostate cancer detection in urine orejaculate. Cell-free DNA may be obtained from a variety of sources. Onecommon source is blood samples of a subject. However, cfDNA or otherfragmented DNA may be derived from a variety of other sources. Forexample, urine and stool samples can be a source of cfDNA, includingctDNA. Cell-free RNA may be obtained from a variety of sources.

In some embodiments, polynucleotides are subjected to subsequent steps(e.g. circularization and amplification) without an extraction step,and/or without a purification step. For example, a fluid sample may betreated to remove cells without an extraction step to produce a purifiedliquid sample and a cell sample, followed by isolation of DNA from thepurified fluid sample. A variety of procedures for isolation ofpolynucleotides are available, such as by precipitation or non-specificbinding to a substrate followed by washing the substrate to releasebound polynucleotides. Where polynucleotides are isolated from a samplewithout a cellular extraction step, polynucleotides will largely beextracellular or “cell-free” polynucleotides. For example, cell-freepolynucleotides may include cell-free DNA (also called “circulating”DNA). In some embodiments, the circulating DNA is circulating tumor DNA(ctDNA) from tumor cells, such as from a body fluid or excretion (e.g.blood sample). Cell-free polynucleotides may include cell-free RNA (alsocalled “circulating” RNA). In some embodiments, the circulating RNA iscirculating tumor RNA (ctRNA) from tumor cells. Tumors frequently showapoptosis or necrosis, such that tumor nucleic acids are released intothe body, including the blood stream of a subject, through a variety ofmechanisms, in different forms and at different levels. Typically, thesize of the ctDNA can range between higher concentrations of smallerfragments, generally 70 to 200 nucleotides in length, to lowerconcentrations of large fragments of up to thousands kilobases.

In some embodiments of any of the various aspects described herein,detecting a sequence variant comprises detecting mutations (e.g. raresomatic mutations) with respect to a reference sequence or in abackground of no mutations, where the sequence variant is correlatedwith disease. In general, sequence variants for which there isstatistical, biological, and/or functional evidence of association witha disease or trait are referred to as “causal genetic variants.” Asingle causal genetic variant can be associated with more than onedisease or trait. In some embodiments, a causal genetic variant can beassociated with a Mendelian trait, a non-Mendelian trait, or both.Causal genetic variants can manifest as variations in a polynucleotide,such 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more sequence differences(such as between a polynucleotide comprising the causal genetic variantand a polynucleotide lacking the causal genetic variant at the samerelative genomic position). Non-limiting examples of types of causalgenetic variants include single nucleotide polymorphisms (SNP),deletion/insertion polymorphisms (DIP), copy number variants (CNV),short tandem repeats (STR), restriction fragment length polymorphisms(RFLP), simple sequence repeats (SSR), variable number of tandem repeats(VNTR), randomly amplified polymorphic DNA (RAPD), amplified fragmentlength polymorphisms (AFLP), inter-retrotransposon amplifiedpolymorphisms (TRAP), long and short interspersed elements (LINE/SINE),long tandem repeats (LTR), mobile elements, retrotransposonmicrosatellite amplified polymorphisms, retrotransposon-based insertionpolymorphisms, sequence specific amplified polymorphism, and heritableepigenetic modification (for example, DNA methylation). A causal geneticvariant may also be a set of closely related causal genetic variants.Some causal genetic variants may exert influence as sequence variationsin RNA polynucleotides. At this level, some causal genetic variants arealso indicated by the presence or absence of a species of RNApolynucleotides. Also, some causal genetic variants result in sequencevariations in protein polypeptides. A number of causal genetic variantshave been reported. An example of a causal genetic variant that is a SNPis the Hb S variant of hemoglobin that causes sickle cell anemia. Anexample of a causal genetic variant that is a DIP is the delta508mutation of the CFTR gene which causes cystic fibrosis. An example of acausal genetic variant that is a CNV is trisomy 21, which causes Down'ssyndrome. An example of a causal genetic variant that is an STR istandem repeat that causes Huntington's disease. Non-limiting examples ofcausal genetic variants and diseases with which they are associated areprovided in Table 1. Additional non-limiting examples of causal geneticvariants are described in WO2014015084. Further examples of genes inwhich mutations are associated with diseases, and in which sequencevariants may be detected according to a method of the disclosure, areprovided in Table 2.

TABLE 1 Disease Gene Variant Name 21-Hydroxylase Deficiency CYP21A2F306 + t 21-Hydroxylase Deficiency CYP21A2 F306 + t 21-HydroxylaseDeficiency CYP21A3 g.655A/C > G 21-Hydroxylase Deficiency CYP21A4g.655A/C > G 21-Hydroxylase Deficiency CYP21A6 G110del8nt 21-HydroxylaseDeficiency CYP21A5 G110del8nt 21-Hydroxylase Deficiency CYP21A7 I172N,rs34607927 21-Hydroxylase Deficiency CYP21A2 I236N 21-HydroxylaseDeficiency CYP21A2 M239K, rs6476 21-Hydroxylase Deficiency CYP21A2 P30L21-Hydroxylase Deficiency CYP21A2 P453S 21-Hydroxylase DeficiencyCYP21A2 Q318X 21-Hydroxylase Deficiency CYP21A2 R356W 21-HydroxylaseDeficiency CYP21A2 V237E, rs12530380 21-Hydroxylase Deficiency CYP21A2V281L, rs6471 ABCC8-Related ABCC8 3992-9G > A HyperinsulinismABCC8-Related ABCC8 delF1388 Hyperinsulinism ABCC8-Related ABCC8delF1388 Hyperinsulinism ABCC8-Related ABCC8 V187D HyperinsulinismAchondroplasia FGFR3 G375C Achondroplasia FGFR3 G380R, rs28931614Achromatopsia CNGB3 c.1148delC Achromatopsia CNGB3 c.1148delCAchromatopsia CNGB3 c.819-826del8 Achromatopsia CNGB3 c.819-826del8Achromatopsia CNGB3 c.886-896del11insT Achromatopsia CNGB3c.886-896del11insT Achromatopsia CNGB3 c.991-3T > G Achromatopsia CNGB3p.Arg403Gln Achromatopsia CNGB3 p.Glu336X Adenosine Monophosphate AMPD1P48L Deaminase 1 Adenosine Monophosphate AMPD1 Q12X, rs17602729Deaminase 1 Agenesis of Corpus Callosum SLC12A6 c.2436delG withNeuronopathy Agenesis of Corpus Callosum SLC12A6 c.2436delG withNeuronopathy Alkaptonuria HGD c.174delA Alkaptonuria HGD c.174delAAlkaptonuria HGD c.457_458insG Alkaptonuria HGD c.457_458insGAlkaptonuria HGD G161R Alkaptonuria HGD G270R Alkaptonuria HGD IVS1 −1G > A Alkaptonuria HGD IVS5 + 1G > A Alkaptonuria HGD Met368ValAlkaptonuria HGD P230S Alkaptonuria HGD S47L Alkaptonuria HGD V300GAlpha-1-Antitrypsin SERPINA1 Arg101His, rs709932 DeficiencyAlpha-1-Antitrypsin SERPINA1 Glu264Val Deficiency Alpha-1-AntitrypsinSERPINA1 Glu342Lys, rs28929474 Deficiency Alpha-1-Antitrypsin SERPINA1Glu376Asp, rs1303 Deficiency Alpha-Mannosidosis MAN2B1 IVS14 + 1G > CAlpha-Mannosidosis MAN2B1 p.L809P Alpha-Mannosidosis MAN2B1 p.R750WAlpha-Sarcoglycanopathy SGCA R77C, rs28933693 Alpha-Thalassemia HBA2H19D Alpha-Thalassemia HBA1 HbQ Alpha-Thalassemia HBA1, HBA2 3.7 kb(type I) deletion alpha-2 Alpha-Thalassemia HBA1, HBA2 3.7 kb (type I)deletion alpha-2 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 1086,--(SEA); deletion of ~20 kb including both alpha-globin genesalpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 1086,--(SEA); deletion of ~20 kb including both alpha-globin genesalpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 1087,--(MED- I); deletion of ~17.5 kb including both alpha-globin genesalpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 1087,--(MED- I); deletion of ~17.5 kb including both alpha-globin genesalpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 1088, -(alpha)20.5; this 20.5 kb deletion involves alpha2 and the 5′ end ofalpha1; alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar database id #1088, - (alpha)20.5; this 20.5 kb deletion involves alpha2 and the 5′end of alpha1; alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar databaseid # 1094, --(FIL); a deletion of 30-34 kb involving the alpha1, alpha2,and zeta genes alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar databaseid # 1094, --(FIL); a deletion of 30-34 kb involving the alpha1, alpha2,and zeta genes alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVar databaseid # 1095, -- (THAI); a deletion of 34-38 kb involving the alpha1,alpha2, and zeta genes alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2 HbVardatabase id # 1095, -- (THAI); a deletion of 34-38 kb involving thealpha1, alpha2, and zeta genes alpha-Thal-1 Alpha-Thalassemia HBA1, HBA2HbVar database id # 1097, --(MED- II); a deletion of 26.5 kb involvingthe two alpha and zeta genes alpha- Thal-1 Alpha-Thalassemia HBA1, HBA2HbVar database id # 1097, --(MED- II); a deletion of 26.5 kb involvingthe two alpha and zeta genes alpha- Thal-1 Alpha-Thalassemia HBA2 HbVardatabase id # 187 Alpha-Thalassemia HBA2 HbVar database id # 2598, IVSI-5 (G > A) Alpha-Thalassemia HBA1, HBA2 HbVar database id # 703Alpha-Thalassemia HBA1, HBA2 HbVar database id # 704 Alpha-ThalassemiaHBA1, HBA2 HbVar database id # 705, Hb Koya Dora Alpha-Thalassemia HBA1,HBA2 HbVar database id # 707, rs41412046 Alpha-Thalassemia HBA1 HbVardatabase id # 87 Alpha-Thalassemia HBA1, HBA2 HbVar database id # 969,Poly A (A−>G); AATAAA−>AATGAA beta+ Alpha-Thalassemia HBA1, HBA2 HbVardatabase id # 971, Poly A (A−>G); AATAAA−>AATAAG beta+ Alpha-ThalassemiaHBA2 M1T Alpha-Thalassemia HBA1 W14X Angiotensin II Receptor, AGTR1A1166C Type 1 Apolipoprotein E APOE p.C112R, rs429358 GenotypingApolipoprotein E APOE p.R158C, rs7412 GenotypingArgininosuccinicaciduria ASL R385C ARSACS SACS 5254C > T ARSACS SACS6594delT ARSACS SACS 6594delT Aspartylglycosaminuria AGA c.199_200delGAAspartylglycosaminuria AGA c.199_200delGA Aspartylglycosaminuria AGAC163S Ataxia with Vitamin E TTPA 744delA Deficiency Ataxia with VitaminE TTPA 744delA Deficiency Ataxia-Telangiectasia ATM R35X Autoimmune AIREc.1163_1164insA Polyendocrinopathy Syndrome Type 1 Autoimmune AIREc.1163_1164insA Polyendocrinopathy Syndrome Type 1 Autoimmune AIREc.769C > T Polyendocrinopathy Syndrome Type 1 Autoimmune AIREc.967_979del Polyendocrinopathy Syndrome Type 1 Autoimmune AIREc.967_979del Polyendocrinopathy Syndrome Type 1 Autoimmune AIRE Y85CPolyendocrinopathy Syndrome Type 1 Bardet-Biedl Syndrome BBS1 M390RBardet-Biedl Syndrome BBS10 p.C91LfsX4 Bardet-Biedl Syndrome BBS10p.C91LfsX4 Best Vitelliform Macular BEST1 c.G383C DystrophyBeta-Sarcoglycanopathy SGCB S114F Beta-Thalassemia HBB −28 (A−>G) beta+Beta-Thalassemia HBB −29 (A−>G) beta+ Beta-Thalassemia HBB −29A > GBeta-Thalassemia HBB −30 (T−>A) beta+ Beta-Thalassemia HBB −87 (C−>G)beta+ Beta-Thalassemia HBB −88C > T Beta-Thalassemia HBB CAP + 1 (A−>C)beta+ Beta-Thalassemia HBB Codon 15 (G−>A); TGG(Trp)−> TAG(stop codon)beta0, rs34716011 Beta-Thalassemia HBB Codon 15 (G−>A); TGG(Trp)−>TAG(stop codon) beta0, rs34716011 Beta-Thalassemia HBB Codon 16 (−C);GGC(Gly)−> GG- beta0 Beta-Thalassemia HBB Codon 16 (−C); GGC(Gly)−> GG-beta0 Beta-Thalassemia HBB Codon 17 (A−>T); AAG(Lys)−> TAG(stop codon)beta0 Beta-Thalassemia HBB Codon 24 (T−>A); GGT(Gly)−> GGA(Gly) beta+Beta-Thalassemia HBB Codon 39 (C−>T); CAG(Gln)−> TAG(stop codon) beta0Beta-Thalassemia HBB Codon 5 (−CT); CCT(Pro)−>C-- beta0 Beta-ThalassemiaHBB Codon 5 (−CT); CCT(Pro)−>C-- beta0 Beta-Thalassemia HBB Codon 6(−A); GAG(Glu)−>G-G beta0 Beta-Thalassemia HBB Codon 6 (−A);GAG(Glu)−>G-G beta0 Beta-Thalassemia HBB Codon 8 (−AA); AAG(Lys)−>--Gbeta0 Beta-Thalassemia HBB Codon 8 (−AA); AAG(Lys)−>--G beta0Beta-Thalassemia HBB Codons 41/42 (−TTCT); TTCTTT(Phe-Phe)−>----TT beta0Beta-Thalassemia HBB Codons 41/42 (−TTCT); TTCTTT(Phe-Phe)−>----TT beta0Beta-Thalassemia HBB Codons 71/72 (+A); TTT AGT(Phe Ser)−>TTT A AGT;beta0 Beta-Thalassemia HBB Codons 71/72 (+A); TTT AGT(Phe Ser)−>TTT AAGT; beta0 Beta-Thalassemia HBB Codons 8/9 (+G); AAG TCT(Lys; Ser)−>AAGG TCT beta0 Beta-Thalassemia HBB Codons 8/9 (+G); AAG TCT(Lys; Ser)−>AAGG TCT beta0 Beta-Thalassemia HBB HbVar database id # 889, IVS-II- 654(C−>T); AAGGCAATA−> AAG{circumflex over ( )}GTAATA beta+(severe)Beta-Thalassemia HBB HbVar database id # 890, IVS-II- 705 (T−>G);GATGTAAGA−> GAG{circumflex over ( )}GTAAGA beta+ Beta-Thalassemia HBBHbVar database id # 891, IVS-II- 745 (C−>G); CAGCTACCAT−> CAG{circumflexover ( )}GTACCAT beta+ Beta-Thalassemia HBB HbVar database id # 979, 619bp deletion beta0 Beta-Thalassemia HBB 619 bp deletion beta0Beta-Thalassemia HBB IVS-I-1 (G−>A); AG{circumflex over ( )}GTTGGT−>AGATTGGT beta0 Beta-Thalassemia HBB IVS-I-1 (G−>T); AG{circumflex over( )}GTTGGT−> AGTTTGGT beta0 Beta-Thalassemia HBB IVS-I-110 (G−>A) beta+;the mutation is 21 nucleotides 5′ to the acceptor splice siteAG{circumflex over ( )}GC Beta-Thalassemia HBB IVS-I-5 (G−>C)beta+(severe) Beta-Thalassemia HBB IVS-II-1 (G−>A); beta0Beta-Thalassemia HBB IVS-II-844 (C−>G); beta+ Beta-Thalassemia HBBIVS1 + 6T > C Beta-Thalassemia HBB IVS11 − 849A > C Beta-Thalassemia HBBIVS11 − 849A > G Biotinidase Deficiency BTD A171T, rs13073139Biotinidase Deficiency BTD D252G, rs28934601 Biotinidase Deficiency BTDD444H, rs13078881 Biotinidase Deficiency BTD F403V BiotinidaseDeficiency BTD G98:d7i3 Biotinidase Deficiency BTD G98:d7i3 BiotinidaseDeficiency BTD Q456H Biotinidase Deficiency BTD R157H BiotinidaseDeficiency BTD R538C Blau Syndrome NOD2 E383K Blau Syndrome NOD2 L469FBlau Syndrome NOD2 R334Q Blau Syndrome NOD2 R334W Bloom Syndrome BLM2407insT Bloom Syndrome BLM 2407insT Bloom Syndrome BLM736delATCTGAinsTAGATTC (2281del6/ins7) Bloom Syndrome BLM736delATCTGAinsTAGATTC (2281del6/ins7) BRCA1 Hereditary BRCA1 185delAGBreast/Ovarian Cancer BRCA1 Hereditary BRCA1 185delAG Breast/OvarianCancer BRCA1 Hereditary BRCA1 5382insC Breast/Ovarian Cancer BRCA1Hereditary BRCA1 5382insC Breast/Ovarian Cancer BRCA1 Hereditary BRCA1Tyr978X Breast/Ovarian Cancer BRCA2 Hereditary BRCA2 6174delTBreast/Ovarian Cancer BRCA2 Hereditary BRCA2 6174delT Breast/OvarianCancer BRCA2 Hereditary BRCA2 8765delAG Breast/Ovarian Cancer BRCA2Hereditary BRCA2 8765delAG Breast/Ovarian Cancer Canavan Disease ASPAA305E (914C > A), rs28940574 Canavan Disease ASPA E285A (854A > C),rs28940279 Canavan Disease ASPA IVS2-2A > G (433-2A > G) Canavan DiseaseASPA Y231X (693 C > A) Carnitine CPT1A G710E Palmitoyltransferase IADeficiency Carnitine CPT1A P479L Palmitoyltransferase IA DeficiencyCarnitine CPT2 G549D Palmitoyltransferase II Deficiency Carnitine CPT2L178F 534 ins/25 bp del Palmitoyltransferase II Deficiency CarnitineCPT2 L178F 534 ins/25 bp del Palmitoyltransferase II DeficiencyCarnitine CPT2 P227L Palmitoyltransferase II Deficiency Carnitine CPT2P50H Palmitoyltransferase II Deficiency Carnitine CPT2 P604SPalmitoyltransferase II Deficiency Carnitine CPT2 Q413fsPalmitoyltransferase II Deficiency Carnitine CPT2 Q413fsPalmitoyltransferase II Deficiency Carnitine CPT2 Q550RPalmitoyltransferase II Deficiency Carnitine CPT2 R124XPalmitoyltransferase II Deficiency Carnitine CPT2 R503CPalmitoyltransferase II Deficiency Carnitine CPT2 R631CPalmitoyltransferase II Deficiency Carnitine CPT2 S113LPalmitoyltransferase II Deficiency Carnitine CPT2 s38fsPalmitoyltransferase II Deficiency Carnitine CPT2 s38fsPalmitoyltransferase II Deficiency Carnitine CPT2 Y628S, rs28936673Palmitoyltransferase II Deficiency Cartilage-Hair Hypoplasia RMRPg.262G > T Cartilage-Hair Hypoplasia RMPR g.70A > G CFTR-RelatedDisorders CFTR 1811 + 1.6 kb A−>G CFTR-Related Disorders CFTR 2183AA > GCFTR-Related Disorders CFTR 2183AA > G CFTR-Related Disorders CFTR3849 + 10 kb C > T CFTR-Related Disorders CFTR A455E CFTR-RelatedDisorders CFTR A559T CFTR-Related Disorders CFTR C524X CFTR-RelatedDisorders CFTR 574delA, 574delA CFTR-Related Disorders CFTR 574delA,574delA CFTR-Related Disorders CFTR 2108delA, 2108delA CFTR-RelatedDisorders CFTR 2108delA, 2108delA CFTR-Related Disorders CFTR 3171delC,3171delC CFTR-Related Disorders CFTR 3171delC, 3171delC CFTR-RelatedDisorders CFTR 621 + 1G−>T CFTR-Related Disorders CFTR2105-2117del13insAGAAA CFTR-Related Disorders CFTR2105-2117del13insAGAAA CFTR-Related Disorders CFTR 711 + 1G−>TCFTR-Related Disorders CFTR 711 + 5G−>A CFTR-Related Disorders CFTR 712− 1G−>T CFTR-Related Disorders CFTR 1288insTA, 1288insTA CFTR-RelatedDisorders CFTR 1288insTA, 1288insTA CFTR-Related Disorders CFTR 936delTACFTR-Related Disorders CFTR 936delTA CFTR-Related Disorders CFTR[delta]F311 CFTR-Related Disorders CFTR [delta]F311 CFTR-RelatedDisorders CFTR 1078delT, 1078delT CFTR-Related Disorders CFTR 1078delT,1078delT CFTR-Related Disorders CFTR 1161delC, 1161delC CFTR-RelatedDisorders CFTR 1161delC, 1161delC CFTR-Related Disorders CFTR 1609delCA,1609delCA CFTR-Related Disorders CFTR 1609delCA, 1609delCA CFTR-RelatedDisorders CFTR [delta]I507 CFTR-Related Disorders CFTR [delta]I507CFTR-Related Disorders CFTR rs332, [delta]F508 CFTR-Related DisordersCFTR rs332, [delta]F508 CFTR-Related Disorders CFTR 1677delTA, 1677delTACFTR-Related Disorders CFTR 1677delTA, 1677delTA CFTR-Related DisordersCFTR 1717 − 1G−>A CFTR-Related Disorders CFTR 1812 − 1G−>A CFTR-RelatedDisorders CFTR 1898 + 1G−>A CFTR-Related Disorders CFTR 1898 + 1G−>TCFTR-Related Disorders CFTR 1898 + 5G−>T CFTR-Related Disorders CFTR1949del84, 1949del84 CFTR-Related Disorders CFTR 1949del84, 1949del84CFTR-Related Disorders CFTR 2043delG, 2043delG CFTR-Related DisordersCFTR 2043delG, 2043delG CFTR-Related Disorders CFTR 2055del9−>ACFTR-Related Disorders CFTR 2055del9−>A CFTR-Related Disorders CFTR2143delT, 2143delT CFTR-Related Disorders CFTR 2143delT, 2143delTCFTR-Related Disorders CFTR 2184delA, 2184delA CFTR-Related DisordersCFTR 2184delA, 2184delA CFTR-Related Disorders CFTR 2184insA, 2184insACFTR-Related Disorders CFTR 2184insA, 2184insA CFTR-Related DisordersCFTR 2307insA, 2307insA CFTR-Related Disorders CFTR 2307insA, 2307insACFTR-Related Disorders CFTR 296 + 12T−>C CFTR-Related Disorders CFTR2789 + 5G−>A CFTR-Related Disorders CFTR 2869insG, 2869insG CFTR-RelatedDisorders CFTR 2869insG, 2869insG CFTR-Related Disorders CFTR 3120G−> ACFTR-Related Disorders CFTR 3120 + 1G−>A CFTR-Related Disorders CFTR3272 − 26A−>G CFTR-Related Disorders CFTR 3659delC, 3659delCCFTR-Related Disorders CFTR 3659delC, 3659delC CFTR-Related DisordersCFTR 3667del4, 3667del4 CFTR-Related Disorders CFTR 3667del4, 3667del4CFTR-Related Disorders CFTR 3791delC, 3791delC CFTR-Related DisordersCFTR 3791delC, 3791delC CFTR-Related Disorders CFTR 3821delT, 3821delTCFTR-Related Disorders CFTR 3821delT, 3821delT CFTR-Related DisordersCFTR 3905insT, 3905insT CFTR-Related Disorders CFTR 3905insT, 3905insTCFTR-Related Disorders CFTR 4016insT, 4016insT CFTR-Related DisordersCFTR 4016insT, 4016insT CFTR-Related Disorders CFTR 394delTT, 394delTTCFTR-Related Disorders CFTR 394delTT, 394delTT CFTR-Related DisordersCFTR 405 + 1G−>A CFTR-Related Disorders CFTR 405 + 3A−>C CFTR-RelatedDisorders CFTR 444delA CFTR-Related Disorders CFTR 444delA CFTR-RelatedDisorders CFTR 3876delA, 3876delA CFTR-Related Disorders CFTR 3876delA,3876delA CFTR-Related Disorders CFTR 457TAT−>G CFTR-Related DisordersCFTR 457TAT−>G CFTR-Related Disorders CFTR 3199del6, 3199del6CFTR-Related Disorders CFTR 3199del6, 3199del6 CFTR-Related DisordersCFTR 406 − 1G−>A CFTR-Related Disorders CFTR 663delT, 663delTCFTR-Related Disorders CFTR 663delT, 663delT CFTR-Related Disorders CFTR935delA, 935delA CFTR-Related Disorders CFTR 935delA, 935delACFTR-Related Disorders CFTR CFTR dele2,3 (21 kb) CFTR-Related DisordersCFTR CFTR dele2,3 (21 kb) CFTR-Related Disorders CFTR D1152HCFTR-Related Disorders CFTR E60X CFTR-Related Disorders CFTR E92XCFTR-Related Disorders CFTR F508C, rs1800093 CFTR-Related Disorders CFTRG178R CFTR-Related Disorders CFTR G330X CFTR-Related Disorders CFTRG480C CFTR-Related Disorders CFTR G542X CFTR-Related Disorders CFTRG551D CFTR-Related Disorders CFTR G622D CFTR-Related Disorders CFTR G85ECFTR-Related Disorders CFTR G91R CFTR-Related Disorders CFTR I148T,rs35516286 CFTR-Related Disorders CFTR I506V CFTR-Related Disorders CFTRIVS8-5T CFTR-Related Disorders CFTR IVS8-7T CFTR-Related Disorders CFTRIVS8-9T CFTR-Related Disorders CFTR K710X CFTR-Related Disorders CFTRL206W CFTR-Related Disorders CFTR M1101K, rs36210737 CFTR-RelatedDisorders CFTR N1303K CFTR-Related Disorders CFTR P574H CFTR-RelatedDisorders CFTR Q1238X CFTR-Related Disorders CFTR Q359K/T360K_wtCFTR-Related Disorders CFTR Q493X CFTR-Related Disorders CFTR Q552XCFTR-Related Disorders CFTR Q890X CFTR-Related Disorders CFTR R1066CCFTR-Related Disorders CFTR R1070Q CFTR-Related Disorders CFTR R1158XCFTR-Related Disorders CFTR R1162X CFTR-Related Disorders CFTR R117CCFTR-Related Disorders CFTR R117H CFTR-Related Disorders CFTR R1283MCFTR-Related Disorders CFTR R334W CFTR-Related Disorders CFTR R347HCFTR-Related Disorders CFTR R347P CFTR-Related Disorders CFTR R352QCFTR-Related Disorders CFTR R553X CFTR-Related Disorders CFTR R560TCFTR-Related Disorders CFTR R709X CFTR-Related Disorders CFTR R75XCFTR-Related Disorders CFTR R764X CFTR-Related Disorders CFTR S1196XCFTR-Related Disorders CFTR S1235R, rs34911792 CFTR-Related DisordersCFTR S1251N CFTR-Related Disorders CFTR S1255X CFTR-Related DisordersCFTR S364P CFTR-Related Disorders CFTR S549I CFTR-Related Disorders CFTRS549N CFTR-Related Disorders CFTR S549R CFTR-Related Disorders CFTRS549R CFTR-Related Disorders CFTR T338I CFTR-Related Disorders CFTRV520F CFTR-Related Disorders CFTR W1089X CFTR-Related Disorders CFTRW1204X CFTR-Related Disorders CFTR W1204X CFTR-Related Disorders CFTRW1282X CFTR-Related Disorders CFTR Y1092X CFTR-Related Disorders CFTRY122X Choroideremia CHM c.1609 + 2dupT Choroideremia CHM c.1609 + 2dupTCLN3-Related Neuronal CLN3 c.461_677del Ceroid-LipofuscinosisCLN3-Related Neuronal CLN3 c.461_677del Ceroid-LipofuscinosisCLN3-Related Neuronal CLN3 c.791_1056del Ceroid-LipofuscinosisCLN3-Related Neuronal CLN3 c.791_1056del Ceroid-LipofuscinosisCLN5-Related Neuronal CLN5 c.1175_1176delAT Ceroid-LipofuscinosisCLN5-Related Neuronal CLN5 c.1175_1176delAT Ceroid-LipofuscinosisCLN5-Related Neuronal CLN5 c.225G > A Ceroid-Lipofuscinosis CLN8-RelatedNeuronal CLN8 c.70C > G Ceroid-Lipofuscinosis Cohen Syndrome VPS13Bc.3348_3349delCT Cohen Syndrome VPS13B c.3348_3349delCT CongenitalCataracts, CTDP1 IVS6 + 389C > T Facial Dysmorphism, and NeuropathyCongenital Disorder of PMM2 p.F119L Glycosylation la Congenital Disorderof PMM2 p.R141H Glycosylation la Congenital Disorder of MPI R295H,rs28928906 Glycosylation lb Congenital Finnish NPHS1 c.121_122delNephrosis Congenital Finnish NPHS1 c.121_122del Nephrosis CongenitalFinnish NPHS1 c.3325C > T Nephrosis Crohn Disease NOD2 3020 ins C CrohnDisease NOD2 3020 ins C Crohn Disease NOD2 G908R, rs2066845 CrohnDisease NOD2 R702W, rs2066844 Cystinosis CTNS 1035insC Cystinosis CTNS1035insC Cystinosis CTNS 537del21 Cystinosis CTNS 537del21 CystinosisCTNS 57 kb deletion Cystinosis CTNS 57 kb deletion Cystinosis CTNS D205NCystinosis CTNS L158P Cystinosis CTNS W138X DFNA 9 (COCH) COCH P51SDiabetes and Hearing Loss mtDNA 3234A > G Diabetes and Hearing LossmtDNA 3271T > C Diabetes and Hearing Loss mtDNA G8363A Diabetes andHearing Loss mtDNA T14709C Early-Onset Primary TOR1A 904_906delGAGDystonia (DYT1) Early-Onset Primary TOR1A 904_906delGAG Dystonia (DYT1)Epidermolysis Bullosa Junctional, LAMB3 3024delT Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMB3 3024delT Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMB3 p.Q243X Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMB3 R144X Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMB3 R42X Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMB3 R635X Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMA3 R650X Herlitz-Pearson TypeEpidermolysis Bullosa Junctional, LAMC2 R95X Herlitz-Pearson Type FactorV Leiden F5 H1299R Thrombophilia Factor V Leiden F5 R506Q, rs6025Thrombophilia Factor V R2 Mutation F5 rs6027 Thrombophilia Factor XIDeficiency F11 E117X (576G > T) Factor XI Deficiency F11 F283L (1074T >C) Factor XI Deficiency F11 IVS14 + 1G > A Factor XI Deficiency F11IVS14del14 Factor XI Deficiency F11 IVS14del14 Factor XIII DeficiencyF13A1 V34L, rs5985 Familial Adenomatous APC I1307K, rs1801155 PolyposisFamilial Dysautonomia IKBKAP 2507 + 6T > C Familial Dysautonomia IKBKAPP914L Familial Dysautonomia IKBKAP R696P Familial APOB R3500Q, rs5742904Hypercholesterolemia Type B Familial APOB R3500W HypercholesterolemiaType B Familial APOB R3531C, rs12713559 Hypercholesterolemia Type BFamilial Mediterranean MEFV A744S (2230G > T) Fever FamilialMediterranean MEFV delI692 (del2076_2078) Fever Familial MediterraneanMEFV delI692 (del2076_2078) Fever Familial Mediterranean MEFV E148Q (442G > C), rs3743930 Fever Familial Mediterranean MEFV E167D (501 G > C)Fever Familial Mediterranean MEFV F479L (1437 C > G) Fever FamilialMediterranean MEFV K695R (2084A > G) Fever Familial Mediterranean MEFVM680I (2040G > C) Fever Familial Mediterranean MEFV M694I (2082G > A),rs28940578 Fever Familial Mediterranean MEFV M694V (2080A > G) FeverFamilial Mediterranean MEFV P369S (1105 C > T), rs11466023 FeverFamilial Mediterranean MEFV R408Q (1223G > A), rs11466024 Fever FamilialMediterranean MEFV R653H (1958G > A) Fever Familial Mediterranean MEFVR761H (2282G > A) Fever Familial Mediterranean MEFV T267I (800 C > T)Fever Familial Mediterranean MEFV V726A (2177T > C), rs28940579 FeverFANCC-Related Fanconi FANCC 322delG Anemia FANCC-Related Fanconi FANCC322delG Anemia FANCC-Related Fanconi FANCC IVS4 + 4A > T (711 + 4A > T)Anemia FANCC-Related Fanconi FANCC Q13X (37C > T) Anemia FANCC-RelatedFanconi FANCC R547X Anemia FGFRl-Related FGFR1 P252R CraniosynostosisFGFR2-Related FGFR2 P253R Craniosynostosis FGFR2-Related FGFR2 S252WCraniosynostosis FGFR3-Related FGFR3 A391E, rs28931615 CraniosynostosisFGFR3-Related FGFR3 P250R, rs4647924 Craniosynostosis Free Sialic AcidStorage SLC17A5 c.1007_1008delTA Disorders Free Sialic Acid StorageSLC17A5 c.1007_1008delTA Disorders Free Sialic Acid Storage SLC17A5c.115C > T Disorders Frontotemporal Dementia MAPT IVS10 + 16 withParkinsonism-17 Frontotemporal Dementia MAPT P301L with Parkinsonism-17Frontotemporal Dementia MAPT P301S with Parkinsonism-17 FrontotemporalDementia MAPT R406W with Parkinsonism-17 Fumarase deficiency FHc.1431_1433dupAAA Fumarase deficiency FH c.1431_1433dupAAA GalactosemiaGALT 5.0Kb gene deletion Galactosemia GALT 5.0Kb gene deletionGalactosemia GALT 5′UTR-119del Galactosemia GALT 5′UTR-119delGalactosemia GALT IVS2-2 A > G Galactosemia GALT K285N Galactosemia GALTL195P T > C Galactosemia GALT L218L Galactosemia GALT N314D, rs2070074Galactosemia GALT Phe171Ser Galactosemia GALT Q169K Galactosemia GALTQ188R Galactosemia GALT S135L Galactosemia GALT T138M C > T GalactosemiaGALT X380R Galactosemia GALT Y209C A > G Gaucher Disease GBA 1035insGGaucher Disease GBA 1035insG Gaucher Disease GBA 84insG Gaucher DiseaseGBA 84insG Gaucher Disease GBA D409H, rs1064651 Gaucher Disease GBAD409V Gaucher Disease GBA IVS2(+1)G > A Gaucher Disease GBA L444P(1448T > C), rs35095275 Gaucher Disease GBA N370S Gaucher Disease GBAR463C Gaucher Disease GBA R463H Gaucher Disease GBA R496H (1604G > A)Gaucher Disease GBA V394L GJB2-Related DFNA 3 GJB2 167delT NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 167delT NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 235delC NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 235delC NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 35delG NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 35delG NonsyndromicHearing Loss and Deafness GJB2-Related DFNA 3 GJB2 IVS1 + 1G > ANonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 101delAGNonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 313del14Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 313del14Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 delE120Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 delE120Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1 GJB2 M34T,rs35887622 Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 Q124X Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 R184P Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 V37I Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 W24X Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 W77R Nonsyndromic Hearing Loss and Deafness GJB2-Related DFNB 1GJB2 W77X Nonsyndromic Hearing Loss and Deafness Glucose-6-PhosphateG6PD A335V Dehydrogenase Deficiency Glucose-6-Phosphate G6PD R459LDehydrogenase Deficiency Glucose-6-Phosphate G6PD R459P DehydrogenaseDeficiency Glucose-6-Phosphate G6PD rs1050828, rs1050828 DehydrogenaseDeficiency Glucose-6-Phosphate G6PD rs1050829, rs1050829 DehydrogenaseDeficiency Glucose-6-Phosphate G6PD rs5030868, rs5030868 DehydrogenaseDeficiency Glutaricacidemia Type 1 GCDH A421V Glutaricacidemia Type 1GCDH R402W Glycogen Storage Disease G6PC 459insTA Type 1a GlycogenStorage Disease G6PC 459insTA Type 1a Glycogen Storage Disease G6PC727G/T Type 1a Glycogen Storage Disease G6PC del F327 Type 1a GlycogenStorage Disease G6PC del F327 Type 1a Glycogen Storage Disease G6PCG188R Type 1a Glycogen Storage Disease G6PC G270V Type 1a GlycogenStorage Disease G6PC Q242X Type 1a Glycogen Storage Disease G6PCQ27fsdelC Type 1a Glycogen Storage Disease G6PC Q27fsdelC Type 1aGlycogen Storage Disease G6PC Q347X Type 1a Glycogen Storage DiseaseG6PC R83C Type 1a Glycogen Storage Disease G6PC R83H Type 1a GlycogenStorage Disease G6PT1 1211delCT Type 1b Glycogen Storage Disease G6PT1A367T Type 1b Glycogen Storage Disease G6PT1 G339C Type 1b GlycogenStorage Disease G6PT1 G339D Type 1b Glycogen Storage Disease G6PT1 W118RType 1b Glycogen Storage Disease GAA Arg854X Type II Glycogen StorageDisease GAA Asp645Glu, rs28940868 Type II Glycogen Storage Disease GAAIVS1(−13t > g) Type II Glycogen Storage Disease AGL 1484delT Type IIIGlycogen Storage Disease AGL 1484delT Type III Glycogen Storage DiseaseAGL 17delAG Type III Glycogen Storage Disease AGL 17delAG Type IIIGlycogen Storage Disease AGL Q6X Type III Glycogen Storage Disease PYGMG204S Type V Glycogen Storage Disease PYGM K542T Type V Glycogen StorageDisease PYGM K542X Type V Glycogen Storage Disease PYGM R49X Type VGNE-Related Myopathies GNE M712T, rs28937594 Gracile Syndrome BCS1Lc.232A > G, rs28937590 Hemoglobin S Beta- HBB c.19G > A ThalassemiaHemoglobin S Beta- HBB c.20A > T Thalassemia Hemoglobin S Beta- HBBc.79G > A Thalassemia Hemoglobin S Beta- HBB Hb CS ThalassemiaHemoglobin S Beta- HBB Hb D Thalassemia Hemoglobin S Beta- HBB Hb OThalassemia Hereditary Fructose ALDOB A149P, rs1800546 IntoleranceHereditary Fructose ALDOB A174D Intolerance Hereditary Fructose ALDOBDelta4E4 Intolerance Hereditary Fructose ALDOB Delta4E4 IntoleranceHereditary Fructose ALDOB N334K Intolerance Hereditary Fructose ALDOBY203X Intolerance Hereditary Pancreatitis PRSS1 A16V HereditaryPancreatitis SPINK1 M1T Hereditary Pancreatitis PRSS1 N29I HereditaryPancreatitis SPINK1 N34S, rs17107315 Hereditary Pancreatitis PRSS1 R122CHereditary Pancreatitis PRSS1 R122H Hereditary Thymine- DPYD rs3918290Uraciluria Hexosaminidase A HEXA 1278insTATC Deficiency Hexosaminidase AHEXA 1278insTATC Deficiency Hexosaminidase A HEXA G269S (805G > A)Deficiency Hexosaminidase A HEXA IVS12 + 1G > C DeficiencyHexosaminidase A HEXA IVS7 + 1G > A Deficiency Hexosaminidase A HEXAIVS9 + 1G > A Deficiency Hexosaminidase A HEXA R178C DeficiencyHexosaminidase A HEXA R178H Deficiency Hexosaminidase A HEXA R247W(739C > T) Deficiency Hexosaminidase A HEXA R249W (745C > T) DeficiencyHFE-Associated Hereditary HFE E168Q Hemochromatosis HFE-AssociatedHereditary HFE E168X Hemochromatosis HFE-Associated Hereditary HFEHM971246, H63H Hemochromatosis HFE-Associated Hereditary HFE P160delCHemochromatosis HFE-Associated Hereditary HFE P160delC HemochromatosisHFE-Associated Hereditary HFE Q127H, rs28934595 HemochromatosisHFE-Associated Hereditary HFE Q283P Hemochromatosis HFE-AssociatedHereditary HFE rs1799945, rs1799945 Hemochromatosis HFE-AssociatedHereditary HFE rs1800562, rs1800562 Hemochromatosis HFE-AssociatedHereditary HFE rs1800730, rs1800730 Hemochromatosis HFE-AssociatedHereditary HFE V53M Hemochromatosis HFE-Associated Hereditary HFE V59MHemochromatosis HFE-Associated Hereditary HFE W169X HemochromatosisHidrotic Ectodermal GJB6 A88V, rs28937872 Dysplasia 2 HidroticEctodermal GJB6 G11R Dysplasia 2 Hidrotic Ectodermal GJB6 V37E Dysplasia2 Homocystinuria Caused by CBS G307S 919G−>A Cystathionine Beta-Synthase Deficiency Homocystinuria Caused by CBS I278T 833T−>C,rs5742905 Cystathionine Beta- Synthase Deficiency Hyperkalemic PeriodicSCN4A I693T Paralysis Type 1 Hyperkalemic Periodic SCN4A L689I ParalysisType 1 Hyperkalemic Periodic SCN4A L689V Paralysis Type 1 HyperkalemicPeriodic SCN4A M1360V Paralysis Type 1 Hyperkalemic Periodic SCN4AM1592V Paralysis Type 1 Hyperkalemic Periodic SCN4A p.A1156T ParalysisType 1 Hyperkalemic Periodic SCN4A p.M1370V Paralysis Type 1Hyperkalemic Periodic SCN4A p.R1448C Paralysis Type 1 HyperkalemicPeriodic SCN4A p.T1313M Paralysis Type 1 Hyperkalemic Periodic SCN4AR675G Paralysis Type 1 Hyperkalemic Periodic SCN4A R675Q Paralysis Type1 Hyperkalemic Periodic SCN4A R675W Paralysis Type 1 HyperkalemicPeriodic SCN4A T704M Paralysis Type 1 Hyperkalemic Periodic SCN4A V781IParalysis Type 1 Hyperomithinemia- SLC25A15 F188del Hyperammonemia-Homocitmllinuria Syndrome Hyperomithinemia- SLC25A15 F188delHyperammonemia- Homocitmllinuria Syndrome Hyperoxaluria, Primary, AGXT33insC Type 1 Hyperoxaluria, Primary, AGXT 33insC Type 1 Hyperoxaluria,Primary, AGXT F152I Type 1 Hyperoxaluria, Primary, AGXT G170R Type 1Hyperoxaluria, Primary, AGXT I244T Type 1 Hyperoxaluria, Primary, GRHPR103delG Type 2 Hyperoxaluria, Primary, GRHPR 103delG Type 2Hypochondroplasia FGFR3 Asn328Ile Hypochondroplasia FGFR3 I538VHypochondroplasia FGFR3 K650M Hypochondroplasia FGFR3 K650N 1950G > THypochondroplasia FGFR3 K650Q Hypochondroplasia FGFR3 N540K 1620C > AHypochondroplasia FGFR3 N540S Hypochondroplasia FGFR3 N540T HypokalemicPeriodic CACNA1S R528G Paralysis Type 1 Hypokalemic Periodic CACNA1SR528H Paralysis Type 1 Hypokalemic Periodic CACNA1S rs28930068,rs28930068 Paralysis Type 1 Hypokalemic Periodic CACNA1S rs28930069,rs28930069 Paralysis Type 1 Hypokalemic Periodic SCN4A R669H ParalysisType 2 Hypokalemic Periodic SCN4A R672C Paralysis Type 2 HypokalemicPeriodic SCN4A R672G Paralysis Type 2 Hypokalemic Periodic SCN4A R672HParalysis Type 2 Hypokalemic Periodic SCN4A R672S Paralysis Type 2Hypophosphatasia ALPL Asp361Val Hypophosphatasia ALPL c.1559delTHypophosphatasia ALPL c.1559delT Hypophosphatasia ALPL E174KHypophosphatasia ALPL G317D Hypophosphatasia ALPL Phe310Leu IsovalericAcidemia IVD A282V Isovaleric Acidemia IVD rs28940889 Krabbe DiseaseGALC EX11-17DEL Krabbe Disease GALC EX11-17DEL Krabbe Disease GALC G270DKrabbe Disease GALC rs1805078, rs1805078 Krabbe Disease GALC rs398607Leber Hereditary Optic mtDNA 14484T > C Neuropathy Leber HereditaryOptic mtDNA 15257G > A Neuropathy Leber Hereditary Optic mtDNA G14459ANeuropathy Leber Hereditary Optic mtDNA G3460A Neuropathy LeberHereditary Optic mtDNA m.11778G > A Neuropathy Leber Hereditary OpticmtDNA m.13708G > A Neuropathy Leber Hereditary Optic mtDNA m.15812G > ANeuropathy Leber Hereditary Optic mtDNA m.3394T > C Neuropathy LeberHereditary Optic mtDNA m.4216T > C Neuropathy Leber Hereditary OpticmtDNA m.4917A > G Neuropathy Leigh Syndrome, French- LRPPRC A354VCanadian Type LGMD2I FKRP L276I, rs28937900 Long Chain 3- HADHA E474Qc.1528G > C Hydroxyacyl-CoA Dehydrogenase Deficiency Long Chain 3- HADHAQ342X 1132C > T Hydroxyacyl-CoA Dehydrogenase Deficiency Maple SyrupUrine Disease BCKDHA Y438N Type 1A Maple Syrup Urine Disease BCKDHBE372X Type 1B Maple Syrup Urine Disease BCKDHB G278S Type 1B Maple SyrupUrine Disease BCKDHB R183P Type 1B McCune-Albright GNAS R201C SyndromeMcCune-Albright GNAS R201G Syndrome McCune-Albright GNAS R201H SyndromeMedium Chain Acyl- ACADM 244insT Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 244insT Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 250C > T Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 583G > A Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 616C > T Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 617G > A Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM 799G > A Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM K304E Coenzyme A Dehydrogenase DeficiencyMedium Chain Acyl- ACADM Y42H Coenzyme A Dehydrogenase DeficiencyMegalencephalic MLC1 135insC Leukoencephalopathy with Subcortical CystsMegalencephalic MLC1 135insC Leukoencephalopathy with Subcortical CystsMELAS mtDNA 3243A > G MELAS mtDNA 3250T > C MELAS mtDNA 3252A > G MELASmtDNA A12770G MELAS mtDNA C3256T MELAS mtDNA G13513A MELAS mtDNA T3291CMELAS mtDNA T8356C MELAS mtDNA T9957C MERRF mtDNA 8361G > A MERRF mtDNAA8296G MERRF mtDNA m.8344A > G Metachromatic ARSA c.459 + 1G > ALeukodystrophy Metachromatic ARSA p.P426L, rs28940893 LeukodystrophyMetachromatic ARSA p.T274M Leukodystrophy Metachromatic ARSA P377LLeukodystrophy Mitochondrial mtDNA A3260T Cardiomyopathy MitochondrialmtDNA A4300G Cardiomyopathy Mitochondrial mtDNA C3303T CardiomyopathyMitochondrial mtDNA T9997C Cardiomyopathy Mitochondrial DNA- mtDNA5537insT Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNA5537insT Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNA8993T > C Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNA8993T > G Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAC11777A Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAT10158C Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAT10191C Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAT8851C Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAT9176C Associated Leigh Syndrome and NARP Mitochondrial DNA- mtDNAT9176G Associated Leigh Syndrome and NARP MTHFR Deficiency MTHFR 1298A >C MTHFR Deficiency MTHFR rs1801133, rs1801133 MTRNR1-Related HearingmtDNA 1095T > C Loss and Deafness MTRNR1-Related Hearing mtDNA 1494C > TLoss and Deafness MTRNR1-Related Hearing mtDNA 1555A > G Loss andDeafness MTRNR1-Related Hearing mtDNA 961T > G Loss and DeafnessMTRNR1-Related Hearing mtDNA A7445G Loss and Deafness MTTS1-RelatedHearing mtDNA 7443A > G Loss and Deafness MTTS1-Related Hearing mtDNA7444G > A Loss and Deafness MTTS1-Related Hearing mtDNA 7472insC Lossand Deafness MTTS1-Related Hearing mtDNA 7472insC Loss and DeafnessMTTS1-Related Hearing mtDNA 7510T > C Loss and Deafness MTTS1-RelatedHearing mtDNA 7511T > C Loss and Deafness MTTS1-Related Hearing mtDNA7512T > C Loss and Deafness Mucolipidosis IV MCOLN1 delEx1 3 Ex7 (511 >6944del) Mucolipidosis IV MCOLN1 delEx1 3 Ex7 (511 > 6944del)Mucolipidosis IV MCOLN1 IVS-2A > G Mucopolysaccharidosis IDUA c.46_57delType I Mucopolysaccharidosis IDUA c.46_57del Type IMucopolysaccharidosis IDUA p.A327P Type I Mucopolysaccharidosis IDUAp.P533R Type I Mucopolysaccharidosis IDUA Q70X Type IMucopolysaccharidosis IDUA W402X Type I Mucopolysaccharidosis SGSHp.R245H Type IIIA Mucopolysaccharidosis SGSH p.R74C Type IIIAMucopolysaccharidosis SGSH p.S66W Type IIIA Mucopolysaccharidosis GUSBp.D152N Type VII Multiple Endocrine RET 2047T > A Neoplasia Type 2Multiple Endocrine RET 2047T > A Neoplasia Type 2 Multiple Endocrine RET2047T > C Neoplasia Type 2 Multiple Endocrine RET 2047T > G NeoplasiaType 2 Multiple Endocrine RET 2048G > A Neoplasia Type 2 MultipleEndocrine RET A883F 2647 G > T Neoplasia Type 2 Multiple Endocrine RETGlu768Asp G > C Neoplasia Type 2 Multiple Endocrine RET M918T NeoplasiaType 2 Muscle-Eye-Brain Disease POMGNT1 c.1539 + 1G > A MYH-AssociatedMUTYH c.1376C > A Polyposis MYH-Associated MUTYH c.494A > G, rs34612342Polyposis MYH-Associated Polyposis GENE_SYMBOL_TBD rs36053993,rs36053993 Niemann-Pick Disease Due SMPD1 c.990delC to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 c.990delC to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 fsP330 to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 fsP330 to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 L302P to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 R496L to SphingomyelinaseDeficiency Niemann-Pick Disease Due SMPD1 R608del to SphingomyelinaseDeficiency Niemann-Pick Disease NPC1 I1061T Type Cl Nijmegen BreakageNBN 657del5 Syndrome Nijmegen Breakage NBN 657del5 SyndromePallister-Hall Syndrome GLI3 2012delG Pallister-Hall Syndrome GLI32012delG Pallister-Hall Syndrome GLI3 2023delG Pallister-Hall SyndromeGLI3 2023delG Pendred Syndrome SLC26A4 1197delT Pendred Syndrome SLC26A41197delT Pendred Syndrome SLC26A4 E384G Pendred Syndrome SLC26A4 IV58 +1(G−>A) Pendred Syndrome SLC26A4 L236P Pendred Syndrome SLC26A4 T416PPeroxisomal Bifunctional HSD17B4 c.302 + 1G > C Enzyme DeficiencyPeroxisomal Bifunctional HSD17B4 c.303 − 1G > A Enzyme DeficiencyPervasive Developmental NLGN4X D396X Disorders Pervasive DevelopmentalNLGN4X D396X Disorders Pervasive Developmental NLGN4X NLGN4X:1253delAGDisorders Pervasive Developmental NLGN4X NLGN4X:1253delAG DisordersPervasive Developmental NLGN3 R451C Disorders Phenylalanine HydroxylasePAH G272X Deficiency Phenylalanine Hydroxylase PAH I65T DeficiencyPhenylalanine Hydroxylase PAH IVS12 + 1G > T Deficiency PhenylalanineHydroxylase PAH L48S, rs5030841 Deficiency Phenylalanine Hydroxylase PAHR158Q, rs5030843 Deficiency Phenylalanine Hydroxylase PAH R252W,rs5030847 Deficiency Phenylalanine Hydroxylase PAH R261Q, rs5030849Deficiency Phenylalanine Hydroxylase PAH R408Q, rs5030859 DeficiencyPhenylalanine Hydroxylase PAH R408W, rs5030858 Deficiency PhenylalanineHydroxylase PAH rs5030855 Deficiency Phenylalanine Hydroxylase PAHrs5030861 Deficiency Phenylalanine Hydroxylase PAH Y414C, rs5030860Deficiency Plasminogen Activator SERPINE1 −844 G > A Inhibitor IPlasminogen Activator SERPINE1 4G/5G Inhibitor I Polycystic KidneyDisease, PKHD1 c.10412T > G Autosomal Recessive Polycystic KidneyDisease, PKHD1 c.107C > T Autosomal Recessive Polycystic Kidney Disease,PKHD1 c.1486C > T Autosomal Recessive Polycystic Kidney Disease, PKHD1c.5895dupA Autosomal Recessive Polycystic Kidney Disease, PKHD1c.5895dupA Autosomal Recessive Polycystic Kidney Disease, PKHD1c.9689delA Autosomal Recessive Polycystic Kidney Disease, PKHD1c.9689delA Autosomal Recessive PPT1-Related Neuronal PPT1 c.364A > TCeroid-Lipofuscinosis PPT1-Related Neuronal PPT1 p.L10XCeroid-Lipofuscinosis PPT1-Related Neuronal PPT1 p.R151XCeroid-Lipofuscinosis PPT1-Related Neuronal PPT1 T75PCeroid-Lipofuscinosis PROP1-related pituitary PROP1 301-302delAG hormomedeficiency PROP1-related pituitary PROP1 301-302delAG hormome deficiencyProthrombin F2 rs1799963 Thrombophilia Prothrombin F2 rs6025, rs6025Thrombophilia Pseudovitamin D CYP27B1 7 bp duplication in exon8Deficiency Rickets Pseudovitamin D CYP27B1 7 bp duplication in exon8Deficiency Rickets Pseudovitamin D CYP27B1 958delG Deficiency RicketsPseudovitamin D CYP27B1 958delG Deficiency Rickets Rett Syndrome MECP2806delG Rett Syndrome MECP2 806delG Rett Syndrome MECP2 A140V,rs28934908 Rett Syndrome MECP2 P152R Rett Syndrome MECP2 P225R RettSyndrome MECP2 R106W, rs28934907 Rett Syndrome MECP2 R133C Rett SyndromeMECP2 R168X Rett Syndrome MECP2 R255X Rett Syndrome MECP2 R270X RettSyndrome MECP2 R294X Rett Syndrome MECP2 R306C, rs28935468 Rett SyndromeMECP2 S134C Rett Syndrome MECP2 T158M, rs28934906 Rhizomelic PEX7p.A218V Chondrodysplasia Punctata Type 1 Rhizomelic PEX7 p.G217RChondrodysplasia Punctata Type 1 Rhizomelic PEX7 p.L292X, rs1805137Chondrodysplasia Punctata Type 1 Short Chain Acyl-CoA ACADS c.511C > T,rs1800556 Dehydrogenase Deficiency Short Chain Acyl-CoA ACADS c.625G > ADehydrogenase Deficiency Short Chain Acyl-CoA ACADS R107C DehydrogenaseDeficiency Shwachman-Diamond SBDS 183_184TA > CT SyndromeShwachman-Diamond SBDS 183_184TA > CT Syndrome Shwachman-Diamond SBDS258 + 2T > C Syndrome Sjogren-Larsson Syndrome ALDH3A2 c.943C > TSmith-Lemli-Opitz DHCR7 C380Y Syndrome Smith-Lemli-Opitz DHCR7 IVS8 −1G > C Syndrome Smith-Lemli-Opitz DHCR7 L109P Syndrome Smith-Lemli-OpitzDHCR7 L157P Syndrome Smith-Lemli-Opitz DHCR7 R352Q SyndromeSmith-Lemli-Opitz DHCR7 R352W Syndrome Smith-Lemli-Opitz DHCR7 R404CSyndrome Smith-Lemli-Opitz DHCR7 R446Q Syndrome Smith-Lemli-Opitz DHCR7T93M Syndrome Smith-Lemli-Opitz DHCR7 V326L Syndrome Smith-Lemli-OpitzDHCR7 W151X Syndrome Smith-Lemli-Opitz DHCR7 W151X Syndrome SpasticParaplegia 13 HSPD1 V72I Sulfate Transporter-Related SLC26A2 340delVOsteochondrodysplasia Sulfate Transporter-Related SLC26A2 340delVOsteochondrodysplasia Sulfate Transporter-Related SLC26A2 c.837C > TOsteochondrodysplasia Sulfate Transporter-Related SLC26A2 C653SOsteochondrodysplasia Sulfate Transporter-Related SLC26A2 IVS1 + 2T > COsteochondrodysplasia Sulfate Transporter-Related SLC26A2 R178XOsteochondrodysplasia TFR2-Related Hereditary TFR2 AVAQ594-597delHemochromatosis TFR2-Related Hereditary TFR2 AVAQ594-597delHemochromatosis TFR2-Related Hereditary TFR2 E60X HemochromatosisTFR2-Related Hereditary TFR2 E60X Hemochromatosis TFR2-RelatedHereditary TFR2 M172K Hemochromatosis TFR2-Related Hereditary TFR2 Y250XHemochromatosis Thanatophoric Dysplasia FGFR3 G370C ThanatophoricDysplasia FGFR3 K650E Thanatophoric Dysplasia FGFR3 R248C ThanatophoricDysplasia FGFR3 S249C Thanatophoric Dysplasia FGFR3 S371C ThanatophoricDysplasia FGFR3 X807C A > T Thanatophoric Dysplasia FGFR3 X807GThanatophoric Dysplasia FGFR3 X807L Thanatophoric Dysplasia FGFR3 X807RThanatophoric Dysplasia FGFR3 X807S Thanatophoric Dysplasia FGFR3 X807WThanatophoric Dysplasia FGFR3 Y373C TPP1-Related Neuronal TPP1 c.509 −1G > A Ceroid-Lipofuscinosis TPP1-Related Neuronal TPP1 c.509 − 1G > CCeroid-Lipofuscinosis TPP1-Related Neuronal TPP1 G284VCeroid-Lipofuscinosis TPP1-Related Neuronal TPP1 p.R208XCeroid-Lipofuscinosis Transthyretin Amyloidosis TTR c.148G > A TyrosineHydroxylase- TH L205P Deficient DRD Tyrosine Hydroxylase- TH R202HDeficient DRD Tyrosinemia Type I FAH E357X Tyrosinemia Type I FAHIVS12 + 5 G > A Tyrosinemia Type I FAH IVS7 − 6 T > G Tyrosinemia Type IFAH IVS8 − 1G > C Tyrosinemia Type I FAH p.W262X Tyrosinemia Type I FAHP261L Tyrosinemia Type I FAH Q64H Wilson Disease ATP7B 1340del4 WilsonDisease ATP7B 3402delC Wilson Disease ATP7B 3402delC Wilson DiseaseATP7B H1069Q Wilson Disease ATP7B R778G Wilson Disease ATP7B W779XWilson Disease ATP7B W779X X-Linked Juvenile RS1 E72K RetinoschisisX-Linked Juvenile RS1 G109R Retinoschisis X-Linked Juvenile RS1 G74VRetinoschisis Zellweger Syndrome PEX1 c.2097_2098insT Spectrum ZellwegerSyndrome PEX1 c.2097_2098insT Spectrum Zellweger Syndrome PEX1c.2916delA Spectrum Zellweger Syndrome PEX1 c.2916delA SpectrumZellweger Syndrome PEX1 p.G843D Spectrum

TABLE 2 Disease/Disorder Gene(s) Neoplasia PTEN; ATM; ATR; EGFR; ERBB2;ERBB3; ERBB4; Notch1; Notch2; Notch3; Notch4; AKT; AKT2; AKT3; HIF;HIF1a; HIF3a; Met; HRG; Bcl2; PPAR alpha; PPAR gamma; WT1 (Wilms Tumor);FGF Receptor Family members (5 members: 1, 2, 3, 4, 5); CDKN2a; APC; RB(retinoblastoma); MEN1; VHL; BRCA1; BRCA2; AR (Androgen Receptor);TSG101; IGF; IGF Receptor; Igf1 (4 variants); Igf2 (3 variants); Igf 1Receptor; Igf 2 Receptor; Bax; Bcl2; caspases family (9 members: 1, 2,3, 4, 6, 7, 8, 9, 12); Kras; Apc Age-related Macular Abcr; Ccl2; Cc2; cp(ceruloplasmin); Timp3; cathepsinD; Degeneration Vldlr; Ccr2Schizophrenia Neuregulin1 (Nrg1); Erb4 (receptor for Neuregulin);Complexin1 (Cplx1); Tph1 Tryptophan hydroxylase; Tph2 Tryptophanhydroxylase 2; Neurexin 1; GSK3; GSK3a; GSK3b Trinucleotide HTT(Huntington's Dx); SBMA/SMAX1/AR (Kennedy's Repeat Dx); FXN/X25(Friedrich's Ataxia); ATX3 (Machado- Disorders Joseph's Dx); ATXN1 andATXN2 (spinocerebellar ataxias); DMPK (myotonic dystrophy); Atrophin-1and Atn1 (DRPLA Dx); CBP (Creb-BP - global instability); VLDLR(Alzheimer's); Atxn7; Atxn10 Fragile X Syndrome FMR2; FXR1; FXR2; mGLUR5Secretase Related APH-1 (alpha and beta); Presenilin (Psen1); nicastrinDisorders (Ncstn); PEN-2 Priors - related Prp disorders ALS SOD1; ALS2;STEX; FUS; TARDBP; VEGF (VEGF-a; VEGF-b; VEGF-c) Drug addiction Prkce(alcohol); Drd2; Drd4; ABAT (alcohol); GRIA2; Grm5; Grin1; Htr1b;Grin2a; Drd3; Pdyn; Gria1 (alcohol) Autism Mecp2; BZRAP1; MDGA2; Sema5A;Neurexin 1; Fragile X (FMR2 (AFF2); FXR1; FXR2; Mglur5) Alzheimer's E1;CHIP; UCH; UBB; Tau; LRP; PICALM; Clusterin; PS1; Disease SORL1; CR1;Vldlr; Uba1; Uba3; CHIP28 (Aqp1, Aquaporin 1); Uchl1; Uchl3; APPInflammation IL-10; IL-1 (IL-1a; IL-1b); IL-13; IL-17 (IL-17a (CTLA8);IL- 17b; IL-17c; IL-17d; IL-17f); II-23; Cx3cr1; ptpn22; TNFa;NOD2/CARD15 for IBD: IL-6; IL-12 (IL-12a; IL-12b); CTLA4; Cx3cl1Parkinson's x-Synuclein; DJ-1; LRRK2; Parkin; PINK1 Disease

In some embodiments, a method further comprises the step of diagnosing asubject based on a calling step, such as diagnosing the subject with adisease associated with a detected causal genetic variant, or reportinga likelihood that the patient has or will develop such disease. Examplesof diseases, associated genes, and associated sequence variants areprovided herein. In some embodiments, a result is reported via a reportgenerator, such as described herein.

In some embodiments, one or more causal genetic variants are sequencevariants associated with a particular type or stage of cancer, or ofcancer having a particular characteristic (e.g. metastatic potential,drug resistance, drug responsiveness). In some embodiments, thedisclosure provides methods for the determination of prognosis, such aswhere certain mutations are known to be associated with patientoutcomes. For example, ctDNA has been shown to be a better biomarker forbreast cancer prognosis than the traditional cancer antigen 53 (CA-53)and enumeration of circulating tumor cells (see e.g. Dawson, et al., NEngl J Med 368:1199 (20 13)). Additionally, the methods of the presentdisclosure can be used in therapeutic decisions, guidance andmonitoring, as well as development and clinical trials of cancertherapies. For example, treatment efficacy can be monitored by comparingpatient ctDNA samples from before, during, and after treatment withparticular therapies such as molecular targeted therapies (monoclonaldrugs), chemotherapeutic drugs, radiation protocols, etc. orcombinations of these. For example, the ctDNA can be monitored to see ifcertain mutations increase or decrease, new mutations appear, etc.,after treatment, which can allow a physician to alter a treatment(continue, stop or change treatment, for example) in a much shorterperiod of time than afforded by methods of monitoring that track patientsymptoms. In some embodiments, a method further comprises the step ofdiagnosing a subject based on a calling step, such as diagnosing thesubject with a particular stage or type of cancer associated with adetected sequence variant, or reporting a likelihood that the patienthas or will develop such cancer.

For example, for therapies that are specifically targeted to patients onthe basis of molecular markers (e.g. Herceptin and her2/neu status),patients are tested to find out if certain mutations are present intheir tumor, and these mutations can be used to predict response orresistance to the therapy and guide the decision whether to use thetherapy. Therefore, detecting and monitoring ctDNA during the course oftreatment can be very useful in guiding treatment selections. Someprimary (before treatment) or secondary (after treatment) cancermutations are found to be responsible for the resistance of cancers tosome therapies (Misale et al., Nature 486(7404):532 (2012)).

A variety of sequence variants that are associated with one or morekinds of cancer that may be useful in diagnosis, prognosis, or treatmentdecisions are known. Suitable target sequences of oncologicalsignificance that find use in the methods of the disclosure include, butare not limited to, alterations in the TP53 gene, the ALK gene, the KRASgene, the PIK3CA gene, the BRAF gene, the EGFR gene, and the KIT gene. Atarget sequence the may be specifically amplified, and/or specificallyanalyzed for sequence variants may be all or part of a cancer-associatedgene. In some embodiments, one or more sequence variants are identifiedin the TP53 gene. TP53 is one of the most frequently mutated genes inhuman cancers, for example, TP53 mutations are found in 45% of ovariancancers, 43% of large intestinal cancers, and 42% of cancers of theupper aerodigestive track (see e.g. M. Olivier, et, al. TP53 Mutationsin Human Cancers: Origins, Consequences, and Clinical Use. Cold SpringHarb Perspect Biol. 2010 January; 2(1). Characterization of the mutationstatus of TP53 can aid in clinical diagnosis, provide prognostic value,and influence treatment for cancer patients. For example, TP53 mutationsmay be used as a predictor of a poor prognosis for patients in CNStumors derived from glial cells and a predictor of rapid diseaseprogression in patients with chronic lymphocytic leukemia (see e.g.McLendon R E, et al. Cancer. 2005 Oct. 15; 1 04(8): 1693-9; Dicker F, etal. Leukemia. 2009 January; 23(1):117-24). Sequence variation can occuranywhere within the gene. Thus, all or part of the TP53 gene can beevaluated herein. That is, as described elsewhere herein, when targetspecific components (e.g. target specific primers) are used, a pluralityof TP53 specific sequences can be used, for example to amplify anddetect fragments spanning the gene, rather than just one or moreselected subsequences (such as mutation “hot spots”) as may be used forselected targets. Alternatively, target-specific primers may be designedthat hybridize upstream or downstream of one or more selectedsubsequences (such a nucleotide or nucleotide region associated with anincreased rate of mutation among a class of subjects, also encompassedby the term “hot spot”). Standard primers spanning such a subsequencemay be designed, and/or B2B primers that hybridize upstream ordownstream of such a subsequence may be designed.

In some embodiments, one or more sequence variants are identified in theall or part of the ALK gene. ALK fusions have been reported in as manyas 7% of lung tumors, some of which are associated with EGFR tyrosinekinase inhibitor (TKI) resistance (see e.g. Shaw et al., J Clin Oncol.Sep. 10, 2009; 27(26): 4247-4253). Up to 2013, several different pointmutations spanning across the entire ALK tyrosine kinase domain havebeen found in patients with secondary resistance to the ALK tyrosinekinase inhibitor (TKI) (Katayama R 2012 Sci Transl Med. 2012 Feb. 8;4(120)). Thus, mutation detection in ALK gene can be used to aid cancertherapy decisions.

In some embodiments, one or more sequence variants are identified in theall or part of the KRAS gene. Approximately 15-25% of patients with lungadenocarcinoma and 40% of patients with colorectal cancer have beenreported as harboring tumor associated KRAS mutations (see e.g. Neuman2009, Pathol Res Pract. 2009; 205(12):858-62). Most of the mutations arelocated at codons 12, 13, and 61 of the KRAS gene. These mutationsactivate KRAS signaling pathways, which trigger growth and proliferationof tumor cells. Some studies indicate that patients with tumorsharboring mutations in KRAS are unlikely to benefit from anti-EGFRantibody therapy alone or in combination with chemotherapy (see e.g.Amado et al. 2008 J Clin On col. 2008 Apr. 1; 26(1 0): 1626-34,Bokemeyer et al. 2009 J Clin Oncol. 2009 Feb. 10; 27(5):663-71). Oneparticular “hot spot” for sequence variation that may be targeted foridentifying sequence variation is at position 35 of the gene.Identification of KRAS sequence variants can be used in treatmentselection, such as in treatment selection for a subject with colorectalcancer.

In some embodiments, one or more sequence variants are identified in theall or part of the PIK3CA gene. Somatic mutations in PIK3CA have beenfrequently found in various type of cancers, for example, in 10-30% ofcolorectal cancers (see e.g. Samuels et al. 2004 Science. 2004 Apr. 23;304(5670):554.). These mutations are most commonly located within two“hotspot” areas within exon 9 (the helical domain) and exon 20 (thekinase domain), which may be specifically targeted for amplificationand/or analysis for the detection sequence variants. Position 3140 mayalso be specifically targeted.

In some embodiments, one or more sequence variants are identified in theall or part of the BRAF gene. Near 50% of all malignant melanomas havebeen reported as harboring somatic mutations in BRAF (see e.g. Maldonadoet al., J Natl Cancer Inst. 2003 Dec. 17; 95(24):1878-90). BRAFmutations are found in all melanoma subtypes but are most frequent inmelanomas derived from skin without chronic sun-induced damage. Amongthe most common BRAF mutations in melanoma are missense mutations V600E,which substitutes valine at position 600 with glutamine. BRAF V600Emutations are associated with clinical benefit of BRAF inhibitortherapy. Detection of BRAF mutation can be used in melanoma treatmentselection and studies of the resistance to the targeted therapy.

In some embodiments, one or more sequence variants are identified in theall or part of the EGFR gene. EGFR mutations are frequently associatedwith Non-Small Cell Lung Cancer (about 10% in the US and 35% in EastAsia; see e.g. Pao et al., Proc Natl Acad Sci US A. 2004 Sep. 7;101(36):13306-11). These mutations typically occur within EGFR exons18-21, and are usually heterozygous. Approximately 90% of thesemutations are exon 19 deletions or exon 21 L858R point mutations.

In some embodiments, one or more sequence variants are identified in theall or part of the KIT gene. Near 85% of Gastrointestinal Stromal Tumor(GIST) have been reported as harboring KIT mutations (see e.g. Heinrichet al. 2003 J Clin Oncol. 2003 December I; 21 (23):4342-9). The majorityof KIT mutations are found in juxtamembrane domain (exon 11, 70%),extracellular dimerization motif (exon 9, 10-15%), tyrosine kinase I(TKI) domain (exon 13, 1-3%), and tyrosine kinase 2 (TK2) domain andactivation loop (exon 17, 1-3%). Secondary KIT mutations are commonlyidentified after target therapy imatinib and after patients havedeveloped resistance to the therapy.

Additional non-limiting examples of genes associated with cancer, all ora portion of which may be analyzed for sequence variants according to amethod described herein include, but are not limited to PTEN; ATM; ATR;EGFR; ERBB2; ERBB3; ERBB4; Notch1; Notch2; Notch3; Notch4; AKT; AKT2;AKT3; HIF; HIF1a; HIF3a; Met; HRG; Bcl2; PPAR alpha; PPAR gamma; WT1(Wilms Tumor); FGF Receptor Family members (5 members: 1, 2, 3, 4, 5);CDKN2a; APC; RB (retinoblastoma); MEN1; VHL; BRCA1; BRCA2; AR; (AndrogenReceptor); TSG101; IGF; IGF Receptor; Igf1 (4 variants); Igf2 (3variants); Igf1 Receptor; Igf2 Receptor; Bax; Bcl2; caspases family (9members: 1, 2, 3, 4, 6, 7, 8, 9, 12); Kras; and Apc. Further examplesare provided elsewhere herein. Examples of cancers that may be diagnosedbased on calling one or more sequence variants in accordance with amethod disclosed herein include, without limitation, Acanthoma, Aciniccell carcinoma, Acoustic neuroma, Acral lentiginous melanoma,Acrospiroma, Acute eosinophilic leukemia, Acute lymphoblastic leukemia,Acute megakaryoblastic leukemia, Acute monocytic leukemia, Acutemyeloblastic leukemia with maturation, Acute myeloid dendritic cellleukemia, Acute myeloid leukemia, Acute promyelocytic leukemia,Adamantinoma, Adenocarcinoma, Adenoid cystic carcinoma, Adenoma,Adenomatoid odontogenic tumor, Adrenocortical carcinoma, Adult T-cellleukemia, Aggressive NK-cell leukemia, AIDS-Related Cancers,AIDS-related lymphoma, Alveolar soft part sarcoma, Ameloblastic fibroma,Anal cancer, Anaplastic large cell lymphoma, Anaplastic thyroid cancer,Angioimmunoblastic T-cell lymphoma, Angiomyolipoma, Angiosarcoma,Appendix cancer, Astrocytoma, Atypical teratoid rhabdoid tumor, Basalcell carcinoma, Basal-like carcinoma, B-cell leukemia, B-cell lymphoma,Bellini duct carcinoma, Biliary tract cancer, Bladder cancer, Blastoma,Bone Cancer, Bone tumor, Brain Stem Glioma, Brain Tumor, Breast Cancer,Brenner tumor, Bronchial Tumor, Bronchioloalveolar carcinoma, Browntumor, Burkitt's lymphoma, Cancer of Unknown Primary Site, CarcinoidTumor, Carcinoma, Carcinoma in situ, Carcinoma of the penis, Carcinomaof Unknown Primary Site, Carcinosarcoma, Castleman's Disease, CentralNervous System Embryonal Tumor, Cerebellar Astrocytoma, CerebralAstrocytoma, Cervical Cancer, Cholangiocarcinoma, Chondroma,Chondrosarcoma, Chordoma, Choriocarcinoma, Choroid plexus papilloma,Chronic Lymphocytic Leukemia, Chronic monocytic leukemia, Chronicmyelogenous leukemia, Chronic Myeloproliferative Disorder, Chronicneutrophilic leukemia, Clear-cell tumor, Colon Cancer, Colorectalcancer, Craniopharyngioma, Cutaneous T-cell lymphoma, Degos disease,Dermatofibrosarcoma protuberans, Dermoid cyst, Desmoplastic small roundcell tumor, Diffuse large B cell lymphoma, Dysembryoplasticneuroepithelial tumor, Embryonal carcinoma, Endodermal sinus tumor,Endometrial cancer, Endometrial Uterine Cancer, Endometrioid tumor,Enteropathy-associated T-cell lymphoma, Ependymoblastoma, Ependymoma,Epithelioid sarcoma, Erythroleukemia, Esophageal cancer,Esthesioneuroblastoma, Ewing Family of Tumor, Ewing Family Sarcoma,Ewing's sarcoma, Extracranial Germ Cell Tumor, Extragonadal Germ CellTumor, Extrahepatic Bile Duct Cancer, Extramammary Paget's disease,Fallopian tube cancer, Fetus in fetu, Fibroma, Fibrosarcoma, Follicularlymphoma, Follicular thyroid cancer, Gallbladder Cancer, Gallbladdercancer, Ganglioglioma, Ganglioneuroma, Gastric Cancer, Gastric lymphoma,Gastrointestinal cancer, Gastrointestinal Carcinoid Tumor,Gastrointestinal Stromal Tumor, Gastrointestinal stromal tumor, Germcell tumor, Germinoma, Gestational choriocarcinoma, GestationalTrophoblastic Tumor, Giant cell tumor of bone, Glioblastoma multiforme,Glioma, Gliomatosis cerebri, Glomus tumor, Glucagonoma, Gonadoblastoma,Granulosa cell tumor, Hairy Cell Leukemia, Hairy cell leukemia, Head andNeck Cancer, Head and neck cancer, Heart cancer, Hemangioblastoma,Hemangiopericytoma, Hemangiosarcoma, Hematological malignancy,Hepatocellular carcinoma, Hepatosplenic T-cell lymphoma, Hereditarybreast-ovarian cancer syndrome, Hodgkin Lymphoma, Hodgkin's lymphoma,Hypopharyngeal Cancer, Hypothalamic Glioma, Inflammatory breast cancer,Intraocular Melanoma, Islet cell carcinoma, Islet Cell Tumor, Juvenilemyelomonocytic leukemia, Kaposi Sarcoma, Kaposi's sarcoma, KidneyCancer, Klatskin tumor, Krukenberg tumor, Laryngeal Cancer, Laryngealcancer, Lentigo maligna melanoma, Leukemia, Leukemia, Lip and OralCavity Cancer, Liposarcoma, Lung cancer, Luteoma, Lymphangioma,Lymphangiosarcoma, Lymphoepithelioma, Lymphoid leukemia, Lymphoma,Macroglobulinemia, Malignant Fibrous Histiocytoma, Malignant fibroushistiocytoma, Malignant Fibrous Histiocytoma of Bone, Malignant Glioma,Malignant Mesothelioma, Malignant peripheral nerve sheath tumor,Malignant rhabdoid tumor, Malignant triton tumor, MALT lymphoma, Mantlecell lymphoma, Mast cell leukemia, Mediastinal germ cell tumor,Mediastinal tumor, Medullary thyroid cancer, Medulloblastoma,Medulloblastoma, Medulloepithelioma, Melanoma, Melanoma, Meningioma,Merkel Cell Carcinoma, Mesothelioma, Mesothelioma, Metastatic SquamousNeck Cancer with Occult Primary, Metastatic urothelial carcinoma, MixedMullerian tumor, Monocytic leukemia, Mouth Cancer, Mucinous tumor,Multiple Endocrine Neoplasia Syndrome, Multiple Myeloma, Multiplemyeloma, Mycosis Fungoides, Mycosis fungoides, Myelodysplastic Disease,Myelodysplastic Syndromes, Myeloid leukemia, Myeloid sarcoma,Myeloproliferative Disease, Myxoma, Nasal Cavity Cancer, NasopharyngealCancer, Nasopharyngeal carcinoma, Neoplasm, Neurinoma, Neuroblastoma,Neuroblastoma, Neurofibroma, Neuroma, Nodular melanoma, Non-HodgkinLymphoma, Non-Hodgkin lymphoma, Nonmelanoma Skin Cancer, Non-Small CellLung Cancer, Ocular oncology, Oligoastrocytoma, Oligodendroglioma,Oncocytoma, Optic nerve sheath meningioma, Oral Cancer, Oral cancer,Oropharyngeal Cancer, Osteosarcoma, Osteosarcoma, Ovarian Cancer,Ovarian cancer, Ovarian Epithelial Cancer, Ovarian Germ Cell Tumor,Ovarian Low Malignant Potential Tumor, Paget's disease of the breast,Pancoast tumor, Pancreatic Cancer, Pancreatic cancer, Papillary thyroidcancer, Papillomatosis, Paraganglioma, Paranasal Sinus Cancer,Parathyroid Cancer, Penile Cancer, Perivascular epithelioid cell tumor,Pharyngeal Cancer, Pheochromocytoma, Pineal Parenchymal Tumor ofIntermediate Differentiation, Pineoblastoma, Pituicytoma, Pituitaryadenoma, Pituitary tumor, Plasma Cell Neoplasm, Pleuropulmonaryblastoma, Polyembryoma, Precursor T-lymphoblastic lymphoma, Primarycentral nervous system lymphoma, Primary effusion lymphoma, PrimaryHepatocellular Cancer, Primary Liver Cancer, Primary peritoneal cancer,Primitive neuroectodermal tumor, Prostate cancer, Pseudomyxomaperitonei, Rectal Cancer, Renal cell carcinoma, Respiratory TractCarcinoma Involving the NUT Gene on Chromosome 15, Retinoblastoma,Rhabdomyoma, Rhabdomyosarcoma, Richter's transformation, Sacrococcygealteratoma, Salivary Gland Cancer, Sarcoma, Schwannomatosis, Sebaceousgland carcinoma, Secondary neoplasm, Seminoma, Serous tumor,Sertoli-Leydig cell tumor, Sex cord-stromal tumor, Sezary Syndrome,Signet ring cell carcinoma, Skin Cancer, Small blue round cell tumor,Small cell carcinoma, Small Cell Lung Cancer, Small cell lymphoma, Smallintestine cancer, Soft tissue sarcoma, Somatostatinoma, Soot wart,Spinal Cord Tumor, Spinal tumor, Splenic marginal zone lymphoma,Squamous cell carcinoma, Stomach cancer, Superficial spreading melanoma,Supratentorial Primitive Neuroectodermal Tumor, Surfaceepithelial-stromal tumor, Synovial sarcoma, T-cell acute lymphoblasticleukemia, T-cell large granular lymphocyte leukemia, T-cell leukemia,T-cell lymphoma, T-cell prolymphocytic leukemia, Teratoma, Terminallymphatic cancer, Testicular cancer, Thecoma, Throat Cancer, ThymicCarcinoma, Thymoma, Thyroid cancer, Transitional Cell Cancer of RenalPelvis and Ureter, Transitional cell carcinoma, Urachal cancer, Urethralcancer, Urogenital neoplasm, Uterine sarcoma, Uveal melanoma, VaginalCancer, Verner Morrison syndrome, Verrucous carcinoma, Visual PathwayGlioma, Vulvar Cancer, Waldenstrom's macroglobulinemia, Warthin's tumor,Wilms' tumor, and combinations thereof. Non-limiting examples ofspecific sequence variants associated with cancer are provided in Table3.

TABLE 3 Mutation COSMIC Amino acid mutation % in Gene Nucleotidemelanoma Clinical relevance BRAF V600E 50% (80-90% among Increasesensitivity to BRAF 1799T > A BRAF mutations) inhibitors Increasesensitivity to MEK inhibitors BRAF V600E 50% (80-90% among Increasesensitivity to BRAF 1799_1800delTGinsAA BRAF mutations) inhibitorsIncrease sensitivity to MEK inhibitors BRAF V600R 50% (<5% amongIncrease sensitivity to BRAF 1798_1799delGTinsAG BRAF mutations)inhibitors Respond to MEK inhibitors BRAF V600M 50% (<1% among Increasesensitivity to BRAF 1798G > A BRAF mutations) inhibitors Respond to MEKinhibitors BRAF V600K 50% (5% among Increase sensitivity to BRAF1798_1799delGTinsAA BRAF mutations) inhibitors Increase sensitivity toMEK inhibitors BRAF V600G 50% (<1% among Increase sensitivity BRAF1799T > G BRAF mutations) inhibitors Respond to MEK inhibitors BRAFV600D 50% (<5% among Increase sensitivity to BRAF 1799-800delTGinsATBRAF mutations) inhibitors Respond to MEK inhibitors BRAF L597V 50% (1%among Respond to BRAF inhibitors 1789C > G BRAF mutations) Respond toMEK inhibitors BRAF L597S 50% (<1% among Respond to BRAF inhibitors1789_1790delCTinsTC BRAF mutations) Respond to MEK inhibitors BRAF L597Q50% (<1% among Respond to BRAF inhibitors 1790T > A BRAF mutations)Respond to MEK inhibitors BRAF L597R 50% (<1% among Respond to BRAFinhibitors 1790T > G BRAF mutations) Respond to MEK inhibitors BRAFD594N 50% (<1% among Not to BRAF inhibitors 1799_1780delTGinsGA BRAFmutations) May respond to MEK inhibitors BRAF D594H 50% (<1% among Notto BRAF inhibitors 1780C > G BRAF mutations) May respond to MEKinhibitors BRAF D594E 50% (<1% among Not to BRAF inhibitors 1782T > ABRAF mutations) May respond to MEK inhibitors BRAF D594E 50% (<1% amongNot to BRAF inhibitors 1782T > G BRAF mutations) May respond to MEKinhibitors BRAF D594G 50% (<1% among Not to BRAF inhibitors 1781A > GBRAF mutations) May respond to MEK inhibitors BRAF D594V 50% (<1% amongNot to BRAF inhibitors 1781A > T BRAF mutations) May respond to MEKinhibitors NRAS G12D 13-25% (4% among cytotoxic chemotherapy 35G > ANRAS mutations) NRAS G13R 13-25% (2% among cytotoxic chemotherapy 37G >C NRAS mutations) Not to BRAF inhibitors May respond to MEK inhibitorsNRAS G13D 13-25% (2% among cytotoxic chemotherapy 38G > A NRASmutations) Not to BRAF inhibitors May respond to MEK inhibitors NRASG13V 13-25% (2% among cytotoxic chemotherapy 38G > T NRAS mutations) Notto BRAF inhibitors May respond to MEK inhibitors NRAS Q61K 13-25% (34%among Respond to MEK inhibitors 181C > A NRAS mutations) NRAS Q61L13-25% (8% among Respond to MEK inhibitors 182A > T NRAS mutations) NRASQ61R 13-25% (35% among Respond to MEK inhibitors 182A > G NRASmutations) NRAS Q61H 13-25% (2% among Respond to MEK inhibitors 183A > CNRAS mutations) NRAS Q61H 13-25% (2% among Respond to MEK inhibitors183A > T NRAS mutations) CTNNB1 S37F ~2-3% in primary uveal 110C > Tmelanoma (46% among CTNNB1 mutations) CNA11 Q209L 34% in primary uveal626A > T melanoma (92% among CNA11 mutations) CNA11 Q209P 34% in primaryuveal 626A > C melanoma (1% among CNA11 mutations) GNAQ Q209L ~50% inprimary uveal Sensitive to MEK inhibitors 626A > T melanoma (~33% amongCNAQ mutations) GNAQ Q209P ~50% in primary uveal Sensitive to MEKinhibitors 626A > C melanoma Sensitive to MEK inhibitors (~64% amongCNAQ mutations) GNAQ Q209R ~50% in primary uveal 626A > G melanoma (~2%among CNAQ mutations) KIT K624E ~20% among KIT Increase sensitivity toKIT 1924A > G mutant malignant inhibitor melanomas KIT D816H ~5% amongKIT mutant Increase sensitivity to KIT 2446G > C malignant melanomasinhibitor KIT L567P ~25% among KIT Increase sensitivity to KIT 1727T > Cmutant malignant inhibitor melanomas KIT V559A ~20% among KIT Increasesensitivity to KIT 1676T > C mutant malignant inhibitor melanomas KITV559D ~5% among KIT mutant Increase sensitivity to KIT 1676T > Amalignant melanomas inhibitor KIT W557R ~10% among KIT Increasesensitivity to KIT 1669T > C mutant malignant inhibitor melanomas KITW557R ~10% among KIT Increase sensitivity to KIT 1669T > A mutantmalignant inhibitor melanomas

In addition, the methods and compositions disclosed herein may be usefulin discovering new, rare mutations that are associated with one or morecancer types, stages, or cancer characteristics. For example,populations of individuals sharing a characteristic under analysis (e.g.a particular disease, type of cancer, stage of cancer, etc.) may besubjected to a method of detection sequence variants according to thedisclosure so as to identify sequence variants or types of sequencevariants (e.g. mutations in particular genes or parts of genes).Sequence variants identified as occurring with a statisticallysignificantly greater frequency among the group of individuals sharingthe characteristic than in individuals without the characteristic may beassigned a degree of association with that characteristic. The sequencevariants or types of sequence variants so identified may then be used indiagnosing or treating individuals discovered to harbor them.

Other therapeutic applications include use in non-invasive fetaldiagnostics. Fetal DNA can be found in the blood of a pregnant woman.Methods and compositions described herein can be used to identifysequence variants in circulating fetal DNA, and thus may be used todiagnose one or more genetic diseases in the fetus, such as thoseassociated with one or more causal genetic variants. Non-limitingexamples of causal genetic variants are described herein, and includetrisomies, cystic fibrosis, sickle-cell anemia, and Tay-Saks disease. Inthis embodiment, the mother may provide a control sample and a bloodsample to be used for comparison. The control sample may be any suitabletissue, and will typically be process to extract cellular DNA, which canthen be sequenced to provide a reference sequence. Sequences of cfDNAcorresponding to fetal genomic DNA can then be identified as sequencevariants relative to the maternal reference. The father may also providea reference sample to aid in identifying fetal sequences, and sequencevariants.

Still further therapeutic applications include detection of exogenouspolynucleotides, such as from pathogens (e.g. bacteria, viruses, fungi,and microbes), which information may inform a diagnosis and treatmentselection. For example, some HIV subtypes correlate with drug resistance(see e.g. hivdb.stanford.edu/pages/genotype-rx). Similarly, HCV typing,subtyping and isotype mutations can also be done using the methods andcompositions of the present disclosure. Moreover, where an HPV subtypeis correlated with a risk of cervical cancer, such diagnosis may furtherinform an assessment of cancer risk. Further non-limiting examples ofviruses that may be detected include Hepadnavirus hepatitis B virus(HBV), woodchuck hepatitis virus, ground squirrel (Hepadnaviridae)hepatitis virus, duck hepatitis B virus, heron hepatitis B virus,Herpesvirus herpes simplex virus (HSV) types 1 and 2, varicella-zostervirus, cytomegalovirus (CMV), human cytomegalovirus (HCMV), mousecytomegalovirus (MCMV), guinea pig cytomegalovirus (GPCMV), Epstein-Barrvirus (EBV), human herpes virus 6 (HEW variants A and B), human herpesvirus 7 (HHV-7), human herpes virus 8 (HHV-8), Kaposi'ssarcoma-associated herpes virus (KSHV), B virus Poxvirus vaccinia virus,variola virus, smallpox virus, monkeypox virus, cowpox virus, camelpoxvirus, ectromelia virus, mousepox virus, rabbitpox viruses, raccoonpoxviruses, molluscum contagiosum virus, orf virus, milker's nodes virus,bovin papullar stomatitis virus, sheeppox virus, goatpox virus, lumpyskin disease virus, fowlpox virus, canarypox virus, pigeonpox virus,sparrowpox virus, myxoma virus, hare fibroma virus, rabbit fibromavirus, squirrel fibroma viruses, swinepox virus, tanapox virus, Yabapoxvirus, Flavivirus dengue virus, hepatitis C virus (HCV), GB hepatitisviruses (GBV-A, GBV-B and GBV-C), West Nile virus, yellow fever virus,St. Louis encephalitis virus, Japanese encephalitis virus, Powassanvirus, tick-borne encephalitis virus, Kyasanur Forest disease virus,Togavirus, Venezuelan equine encephalitis (VEE) virus, chikungunyavirus, Ross River virus, Mayaro virus, Sindbis virus, rubella virus,Retrovirus human immunodeficiency virus (HIV) types 1 and 2, human Tcell leukemia virus (HTLV) types 1, 2, and 5, mouse mammary tumor virus(MMTV), Rous sarcoma virus (RSV), lentiviruses, Coronavirus, severeacute respiratory syndrome (SARS) virus, Filovirus Ebola virus, Marburgvirus, Metapneumoviruses (MPV) such as human metapneumovirus (HMPV),Rhabdovirus rabies virus, vesicular stomatitis virus, Bunyavirus,Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, LaCrosse virus, Hantaan virus, Orthomyxovirus, influenza virus (types A,B, and C), Paramyxovirus, parainfluenza virus (PIV types 1, 2 and 3),respiratory syncytial virus (types A and B), measles virus, mumps virus,Arenavirus, lymphocytic choriomeningitis virus, Junin virus, Machupovirus, Guanarito virus, Lassa virus, Ampari virus, Flexal virus, Ippyvirus, Mobala virus, Mopeia virus, Latino virus, Parana virus, Pichindevirus, Punta toro virus (PTV), Tacaribe virus and Tamiami virus.

Examples of bacterial pathogens that may be detected by methods of thedisclosure include, without limitation, Specific examples of bacterialpathogens include without limitation any one or more of (or anycombination of) Acinetobacter baumanii, Actinobacillus sp.,Actinomycetes, Actinomyces sp. (such as Actinomyces israelii andActinomyces naeslundii), Aeromonas sp. (such as Aeromonas hydrophila,Aeromonas veronii biovar sobria (Aeromonas sobria), and Aeromonascaviae), Anaplasma phagocytophilum, Alcaligenes xylosoxidans,Acinetobacter baumanii, Actinobacillus actinomycetemcomitans, Bacillussp. (such as Bacillus anthracis, Bacillus cereus, Bacillus subtilis,Bacillus thuringiensis, and Bacillus stearothermophilus), Bacteroidessp. (such as Bacteroides fragilis), Bartonella sp. (such as Bartonellabacilliformis and Bartonella henselae, Bifidobacterium sp., Bordetellasp. (such as Bordetella pertussis, Bordetella parapertussis, andBordetella bronchiseptica), Borrelia sp. (such as Borrelia recurrentis,and Borrelia burgdorferi), Brucella sp. (such as Brucella abortus,Brucella canis, Brucella melintensis and Brucella suis), Burkholderiasp. (such as Burkholderia pseudomallei and Burkholderia cepacia),Campylobacter sp. (such as Campylobacter jejuni, Campylobacter coli,Campylobacter lari and Campylobacter fetus), Capnocytophaga sp.,Cardiobacterium hominis, Chlamydia trachomatis, Chlamydophilapneumoniae, Chlamydophila psittaci, Citrobacter sp. Coxiella burnetii,Corynebacterium sp. (such as, Corynebacterium diphtheriae,Corynebacterium jeikeum and Corynebacterium), Clostridium sp. (such asClostridium perfringens, Clostridium difficile, Clostridium botulinumand Clostridium tetani), Eikenella corrodens, Enterobacter sp. (such asEnterobacter aerogenes, Enterobacter agglomerans, Enterobacter cloacaeand Escherichia coli, including opportunistic Escherichia coli, such asenterotoxigenic E. coli, enteroinvasive E. coli, enteropathogenic E.coli, enterohemorrhagic E. coli, enteroaggregative E. coli anduropathogenic E. coli) Enterococcus sp. (such as Enterococcus faecalisand Enterococcus faecium) Ehrlichia sp. (such as Ehrlichia chafeensiaand Ehrlichia canis), Erysipelothrix rhusiopathiae, Eubacterium sp.,Francisella tularensis, Fusobacterium nucleatum, Gardnerella vaginalis,Gemella morbillorum, Haemophilus sp. (such as Haemophilus influenzae,Haemophilus ducreyi, Haemophilus aegyptius, Haemophilus parainjluenzae,Haemophilus haemolyticus and Haemophilus parahaemolyticus, Helicobactersp. (such as Helicobacter pylori, Helicobacter cinaedi and Helicobacterfennelliae), Kingella kingii, Klebsiella sp. (such as Klebsiellapneumoniae, Klebsiella granulomatis and Klebsiella oxytoca),Lactobacillus sp., Listeria monocytogenes, Leptospira interrogans,Legionella pneumophila, Leptospira interrogans, Peptostreptococcus sp.,Moraxella catarrhalis, Morganella sp., Mobiluncus sp., Micrococcus sp.,Mycobacterium sp. (such as Mycobacterium leprae, Mycobacteriumtuberculosis, Mycobacterium intracellulare, Mycobacterium avium,Mycobacterium bovis, and Mycobacterium marinum), Mycoplasm sp. (such asMycoplasma pneumoniae, Mycoplasma hominis, and Mycoplasma genitalium),Nocardia sp. (such as Nocardia asteroides, Nocardia cyriacigeorgica andNocardia brasiliensis), Neisseria sp. (such as Neisseria gonorrhoeae andNeisseria meningitidis), Pasteurella multocida, Plesiomonasshigelloides. Prevotella sp., Porphyromonas sp., Prevotellamelaminogenica, Proteus sp. (such as Proteus vulgaris and Proteusmirabilis), Providencia sp. (such as Providencia alcalifaciens,Providencia rettgeri and Providencia stuartii), Pseudomonas aeruginosa,Propionibacterium acnes, Rhodococcus equi, Rickettsia sp. (such asRickettsia rickettsia, Rickettsia akari and Rickettsia prowazekii,Orientia tsutsugamushi (formerly: Rickettsia tsutsugamushi) andRickettsia typhi), Rhodococcus sp., Serratia marcescens,Stenotrophomonas maltophilia, Salmonella sp. (such as Salmonellaenterica, Salmonella typhi, Salmonella paratyphi, Salmonellaenteritidis, Salmonella cholerasuis and Salmonella typhimurium),Serratia sp. (such as Serratia marcesans and Serratia liquifaciens),Shigella sp. (such as Shigella dysenteriae, Shigella jlexneri, Shigellaboydii and Shigella sonnei), Staphylococcus sp. (such as Staphylococcusaureus, Staphylococcus epidermidis, Staphylococcus hemolyticus,Staphylococcus saprophyticus), Streptococcus sp. (such as Streptococcuspneumoniae (for example chloramphenicol-resistant serotype 4Streptococcus pneumoniae, spectinomycin-resistant serotype 6BStreptococcus pneumoniae, streptomycin-resistant serotype 9VStreptococcus pneumoniae, erythromycin-resistant serotype 14Streptococcus pneumoniae, optochin-resistant serotype 14 Streptococcuspneumoniae, rifampicin-resistant serotype 18C Streptococcus pneumoniae,tetracycline-resistant serotype 19F Streptococcus pneumoniae,penicillin-resistant serotype 19F Streptococcus pneumoniae, andtrimethoprim-resistant serotype 23F Streptococcus pneumoniae,chloramphenicol-resistant serotype 4 Streptococcus pneumoniae,spectinomycin-resistant serotype 6B Streptococcus pneumoniae,streptomycin-resistant serotype 9V Streptococcus pneumoniae,optochin-resistant serotype 14 Streptococcus pneumoniae,rifampicin-resistant serotype 18C Streptococcus pneumoniae,penicillin-resistant serotype 19F Streptococcus pneumoniae, ortrimethoprim-resistant serotype 23F Streptococcus pneumoniae),Streptococcus agalactiae, Streptococcus mutans, Streptococcus pyogenes,Group A streptococci, Streptococcus pyogenes, Group B streptococci,Streptococcus agalactiae, Group C streptococci, Streptococcus anginosus,Streptococcus equismilis, Group D streptococci, Streptococcus bovis,Group F streptococci, and Streptococcus anginosus Group G streptococci),Spirillum minus, Streptobacillus moniliformi, Treponema sp. (such asTreponema carateum, Treponema petenue, Treponema pallidum and Treponemaendemicum, Tropheryma whippelii, Ureaplasma urealyticum, Veillonellasp., Vibrio sp. (such as Vibrio cholerae, Vibrio parahemolyticus, Vibriovulnificus, Vibrio parahaemolyticus, Vibrio vulnificus, Vibrioalginolyticus, Vibrio mimicus, Vibrio hollisae, Vibrio fluvialis, Vibriometchnikovii, Vibrio damsela and Vibrio fumisii), Yersinia sp. (such asYersinia enterocolitica, Yersinia pestis, and Yersiniapseudotuberculosis) and Xanthomonas maltophilia among others.

In some embodiments, the methods and compositions of the disclosure areused in monitoring organ transplant recipients. Typically,polynucleotides from donor cells will be found in circulation in abackground of polynucleotides from recipient cells. The level of donorcirculating DNA will generally be stable if the organ is well accepted,and the rapid increase of donor DNA (e.g. as a frequency in a givensample) can be used as an early sign of transplant rejection. Treatmentcan be given at this stage to prevent transplant failure. Rejection ofthe donor organ has been shown to result in increased donor DNA inblood; see Snyder et al., PNAS 108(15):6629 (2011). The presentdisclosure provides significant sensitivity improvements over priortechniques in this area. In this embodiment, a recipient control sample(e.g. cheek swab, etc.) and a donor control sample can be used forcomparison. The recipient sample can be used to provide that referencesequence, while sequences corresponding to the donor's genome can beidentified as sequence variants relative to that reference. Monitoringmay comprise obtaining samples (e.g. blood samples) from the recipientover a period of time. Early samples (e.g. within the first few weeks)can be used to establish a baseline for the fraction of donor cfDNA.Subsequent samples can be compared to the baseline. In some embodiments%, an increase in the fraction of donor cfDNA of about or at least about10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 100%, 250%, 500%,1000%, or more may serve as an indication that a recipient is in theprocess of rejecting donor tissue.

In some embodiments, there are provided methods of detection.

EXAMPLES

The following examples are given for the purpose of illustrating variousembodiments of the invention and are not meant to limit the presentinvention in any fashion. The present examples, along with the methodsdescribed herein are presently representative of preferred embodiments,are exemplary, and are not intended as limitations on the scope of theinvention. Changes therein and other uses which are encompassed withinthe spirit of the invention as defined by the scope of the claims willoccur to those skilled in the art.

Example 1: Preparing Tandem Repetitive Sequencing Library for MutationDetection

Starting with >10 ng of ˜150 bp DNA fragments in 124 water or 10 mMTris-HCl pH 8.0, 2 μL 10× CircLigase buffer mix was added and mixtureheated to 95° C. for 2 minutes and chilled on ice for 5 minutes. To thiswas added 4 μL 5M Betaine, 1 μL 50 mM MnCl₂, and 1 μL CircLigase II. Thereaction was incubated at 60° C. for at least 12 hours. Next was added 2μL RCA primer mix (50 nM each, to a 5 nM final concentration), andmixed. The mixture was heated to 95° C. for 2 minutes and cooled down to42° C. for 2 hours. The CirLigation product was purified with Zymo oligonucleotide purification kit. According to the manufacturer'sinstructions, 28 μL water was added into 22 μL CircLigation product fora total volume of 50 μL. This was mixed with 100 μL Oligo binding bufferand 400 μL ethanol. This was spun for 30 seconds at >10,000×g, and theflow-through was discarded. 750 μL DNA wash buffer was added, then spunfor 30 seconds at >10,000×g, flow-through discarded, and spun foranother 1 minute at top speed. The column was moved to anew Eppendorftube and eluted with 17 μL water (final eluted volume was approximately15 μL).

Rolling circle amplification was conducted in a volume of about 50 μL.Into the 15 μL elution sample, was added 5 μL 10× RepliPHI buffer(Epicentre), 1 μL 25 mM dNTPs, 2 μL 100 mM DTT, 1 μL 100 U/μL RepliPHIPhi29, and 26 μL water. The reaction mix was incubated at 30° C. for 1hour. RCA products were purified by adding 80 μL of Ampure beads,following the manufacturer's instructions for the remaining wash steps.For elution, 22.5 μL elution buffer was added, and the beads wereincubated at 65° C. for 5 minutes. After spinning briefly, the tube wasreturned to the magnets.

About 20 μL of eluted product from the RCA reaction was mixed with 25 μL2× Phusion Master mix, 2.5 μL DMSO, and 0.5 μL of 10 μM of each B2Bprimer mix. Amplification used the following PCR program: 95° C. for 1minute, 5 cycles of extension (95° C. for 15 seconds, 55° C. for 15seconds, 72° C. for 1 minute), 13-18 cycles of replication (95° C. for15 seconds, 68° C. for 15 seconds, 72° C. for 1 minute), and 72° C. for7 minutes of final extension. PCR product size was checked by running anE-gel. If the range was from 100-500 bp, a 0.6× Ampure bead purificationwas performed to enrich 300-500 bp and take 1-2 ng for another round ofPCR with small RNA library adaptor primers. If the product size rangewas >1000 bp, products were purified with 1.6× Ampure beads, and 2-3 ngtaken for Nextera XT amplicon library prep to enrich sizes in the rangeof 400-1000 bp by 0.6× Ampure bead purification.

For performing bioinformatics on sequencing data, FASTQ files wereobtained from a MiSeq run. The sequences were aligned in FASTQ files toreference genomic sequences containing targeted sequences (e.g. KRAS andEGFR) using BWA. The regions and lengths of repeat units and itsreference position were found for each sequence (both reads) using thealignment results. Variants in all loci were found using the alignmentresults and information of repeat units of each sequence. Results fromtwo reads were combined. The normalized frequency of variants and thenoise level were computed. Multiple additional criteria in variant callsfrom confirmed variants were applied, including qscore >30 and p-value<0.0001. The confirmed variants that passed these criteria were reportedas true variants (mutations). The process can be automated by computerlanguages (e.g. python).

Example 2: Making Tandem Repetitive Sequencing Library for Detection ofSequence Variants

10 ng of DNA fragments with 150 bp average length in a 12 μL volume wereused for tandem repetitive sequencing library construction. The DNA waspreviously processed with T4 Polynucleotide Kinase (New England Biolabs)to add phosphate group at the 5′ terminus and leaving a hydroxyl groupat the 3′ terminus. For DNA fragments generated from DNase I orenzymatic fragmentation or extracted from serum or plasma, the terminusprocessing step was skipped. The DNA was mixed with 2 μL 10× CircLigasebuffer (Epicentre CL9021K). The mixture was heated to 95° C. for 2minutes and chilled on ice for 5 minutes, then 4 nt Betaine, 1 nt 50 mMMnCl₂, and 1 μL CircLigase II (Epicentre CL9021K) were added. Theligation reaction was performed at 60° C. for at least 12 hours. 1 μL ofeach RCA primer mix at 200 nM (to final of 10 nM final concentration)was added to the ligation products and mixed, heated to 96° C. for 1minute, cooled to 42° C., and incubated at 42° C. for 2 hours.

The CircLigation product with hybridized RCA primers were purified withZymo oligo nucleotide purification kit (Zymo Research, D4060). To dothis, the 21 μL of product was diluted to 50 μL with 28 μL water and 1μL of carrier RNA (Sigma-Aldrich, R5636, diluted at 200 ng/μL with 1×TEbuffer). The diluted sample was mixed with 100 μL Oligo binding bufferand 400 μL of 100% ethanol. The mixture was loaded on the column andcentrifuged for 30 seconds at >10,000×g. The flow-through was discarded.The column was washed with 750 μL DNA wash buffer by centrifuging for 30seconds at >10,000×g, discarding the flow-through and centrifuging foranother 1 minute at top speed. The column was moved to anew 1.5 mLEppendorf tube and the DNA was eluted with 17 μL elution buffer (10 mMTris-Cl pH 8.0, final eluted volume about 15 μL).

5 μL 10× RepliPHI buffer, 2 μL 25 mM dNTPs, 2 μL 100 mM DTT, 1 μL 100U/μL RepliPHI Phi29, and 25 μL water (Epicentre, RH040210) were added tothe 15 μL eluted sample from the column, for a total reaction volume of50 μL. The reaction mix was incubated at 30° C. for 2 hours. The RCAproducts were purified by adding 80 μL of Ampure XP beads (BeckmanCoulter, A63881). The manufacturer's instructions were followed for thewashing steps. RCA products were eluted after 5 minutes of 65° C.incubation in 22.5 μL elution buffer. The tube was briefly centrifugedbefore returned to magnets.

About 20 μL of eluted product from the RCA reaction were mixed with 25μL 2× Phusion Master mix (New England Biolabs M0531S), 2.5 μL water, 2.5μL DMSO, and 0.5 μL of B2B primer mix (10 μM each). Amplification wasperformed with the following thermocycling program: 95° C. for 2minutes, 5 cycles of extension (95° C. for 30 seconds, 55° C. for 15seconds, 72° C. for 1 minute), 18 cycles of replication (95° C. for 15seconds, 68° C. for 15 seconds, 72° C. for 1 minute), and 72° C. for 7minutes of final extension. The PCR product size was checked byelectrophoresis. Once the long PCR products were confirmed byelectrophoresis, the PCR products were mixed with 30 μL Ampure beads(0.6× volume) for purification to enrich >500 bp PCR products. Thepurified products were quantified with Qubit 2.0 Quantification Platform(Invitrogen). About 1 ng purified DNA was used for Nextera XT Ampliconlibrary preparation (Illumina FC-131-1024). Library elements with aninsert size of >500 bp were enriched by purification with 0.6× Ampurebeads.

The concentration and size distribution of the amplified libraries wereanalyzed using the Agilent DNA High Sensitivity Kit for the 2100Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA). Sequencing wasperformed using Illumina MiSeq with 2-250 bp MiSeq sequencing kit.According to the MiSeq manual, 12 pM denatured library was loaded on thesequencing run.

In a variation on this procedure, Illumina adapters were used in librarypreparation in place of Nextera preparation. To do this, about 1 ng ofsimilarly purified DNA was used for PCR amplification with a pair ofprimers containing the universal part of B2B primers and IlluminaAdapter sequence (P5 and P7; 5′CAAGCAGAAGACGGCATACGA3′ (SEQ ID NO: 1)and 5′ACACTCTITCCCTACACGACGCTCTTCCGATCT3′ (SEQ ID NO: 2)). Using PhusionMaster Mix, 12 cycles of replication steps (95° C. for 30 seconds, 55°C. for 15 seconds, 72° C. for 60 seconds) were performed. The purpose ofthis amplification step was to add Illumina adapters for ampliconsequencing. Amplicons >500 bp in length were enriched with 0.6× Ampurebeads. The concentration and size distribution of the amplicon librarywere analyzed using the Agilent DNA High Sensitivity Kit for the 2100Bioanalyzer (Agilent technologies Inc., Santa Clara, CA). Sequencing wasperformed using Illumina MiSeq with 2×250 bp MiSeq sequencing kit. Theuniversal part of B2B primers also served as sequencing primer sequencesand custom sequencing primer was added if the primer was not containedin the Illumina kit. 12 pM denatured library was loaded on thesequencing run.

The target region coverage in one example analysis is illustrated inFIG. 33 . Table 6 below describes results for the analysis of thetargeted regions.

Table 4 provides examples of RCA primers useful in methods of thedisclosure. Table 5 provides examples of B2B primers useful in methodsof the disclosure.

TABLE 4 Gene RCA RCA PrimerA SEQ RCA primerB RCA Primer B SEQ NameprimerA Sequence ID NO name Sequence ID NO PIK3CA PIK3CA-E1aGCTTTGAGCTGT  3 PIK3CA-E1b AAAGCAATTTCTA 45 TCTTTGTCATT CACGAGATCCPIK3CA PIK3CA-E2a TTTAATTGTGTG  4 PIK3CA-E2b ATTAAACAGCATG 46GAAGATCCAATC CATTGAACTG EGFR EGFR-E1a CTTTCTCACCTT  5 EGFR-E1bAAATTCCCGTCGC 47 CTGGGATCC TATCAAG EGFR EGFR-E2a CCATCACGTAGG  6EGFR-E2b ATGGCCAGCGTGG 48 CTTCCTG ACAAC EGFR EGFR-E3a GACATAGTCCAG  7EGFR-E3b TGTCCGGGAACAC 49 GAGGCAGC AAAGAC EGFR EGFR-E4a AAGCGACGGTCC  8EGFR-E4b TGGCAGCCAGGAA 50 TCCAAG CGTAC EGFR EGFR-E5a AGTACGTTCCTG  9EGFR-E5b AACACCGCAGCAT 51 GCTGCC GTCAAG EGFR EGFR-E6a ATCCACTTGATA 10EGFR-E6b AAGTGGATGGCAT 52 GGCACCTTG TGGAATC EGFR EGFR-E7a TCTCGCTGGCAG11 EGFR-E7b CCTGGAGAAAGGA 53 GGATTC GAACGC EGFR EGFR-E9a AACTTTGGGCGA 12EGFR-E9b AGTTCCGTGAGTT 54 CTATCTGC GATCATCG EGFR EGFR-E10a TTGGAGTCTGTA13 EGFR-E10b ACTTCTACCGTGC 55 GGACTTGGC CCTGATG EGFR EGFR-E11aCTGCTGTGGGAT 14 EGFR-E11b CACAGCAGGGCTT 56 GAGGTACTC CTTCAG EGFREGFR-E12a CATGGAATGCTT 15 EGFR-E12b CATGGGCAACTTC 57 GTACCACATC TCTGTTTCBRAF BRAF-E1a CAGTTTGAACAG 16 BRAF-E1b AAACTGATGGGAC 58 TTGTCTGGATCCCACTCC PTEN PTEN-E1a TGTTTCTGCTAA 17 PTEN-E1b AGGAGATATCAAG 59CGATCTCTTTG AGGATGGATTC PTEN PTEN-E2a CAGGAAATCCCA 18 PTEN-E2bTCCTGCAGAAAGA 60 TAGCAATAATG CTTGAAGG PTEN PTEN-E3a GCTTTGAATCCA 19PTEN-E3b GGATTCAAAGCAT 61 AAAACCTTAAAA AAAAACCATTAC C PTEN PTEN-E4aTACAGTACATTC 20 PTEN-E4b TATGTTGTATAAC 62 ATACCTACCTCT TTAAACCCGATAG GCAC PTEN PTEN-E5a AAAGGATATTGT 21 PTEN-E5b TTGAAGACCATAA 63 GCAACTGTGGCCCACCAC PTEN PTEN-E6a CCATAGAAATCT 22 PTEN-E6b AAGTAAGGACCAG 64AGGGCCTCTTG AGACAAAAAGG PTEN PTEN-E7a CCAGATGATTCT 23 PTEN-E7bGGATTATAGACCA 65 TTAACAGGTAGC GTGGCACTG PTEN PTEN-E9a GAACTTGTCTTC 24PTEN-E9b CATGTACTTTGAG 66 CCGTCGTG TTCCCTCAGC PTEN PTEN-E12aTCTGGTCCTGGT 25 PTEN-E12b CAGGACCAGAGGA 67 ATGAAGAATG AACCTCAG PTENPTEN-E13a GCTCTATACTGC 26 PTEN-E13b CGTGCAGATAATG 68 AAATGCTATCGACAAGGAATATC PTEN PTEN-E14a TTGGAGAAAAGT 27 PTEN-E14b GGTCAGTTAAATT 69ATCGGTTGG AAACATTTTGTGG PTEN PTEN-E15a TGGTGTTACAGA 28 PTEN-E15bGATGTTAGTGACA 70 AGTTGAACTGC ATGAACCTGATC KRAS KRAS-E1a AAGAGTGCCTTG 29KRAS-E1b TCTTGCCTACGCC 71 ACGATACAGC ACCAG TP53 TP53-E1a CCTGACTCAGAC 30TP53-E1b CAGGCCCTTCTGT 72 TGACATTCTCC CTTGAAC TP53 TP53-E2a ATGTTCCGAGAG31 TP53-E2b GAACATCTCGAAG 73 CTGAATGAG CGCTCAC TP53 TP53-E3aTTAAAGGACCAG 32 TP53-E3b TTATGGTATAAGT 74 ACCAGCTTTC TGGTGTTCTGAAG TP53TP53-E4a CTTGGGACCTCT 33 TP53-E4b AGAGGTCCCAAGA 75 TATCAAGTGGCTTAGTACCTG TP53 TP53-E5a AAGCAAGCAGGA 34 TP53-E5b GCTTGCTTACCTC 76CAAGAAGC GCTTAGTG TP53 TP53-E6a GGGACGGAACAG 35 TP53-E6b TTCCGTCCCAGTA77 CTTTGAG GATTACCAC TP53 TP53-E7a CAACTACATGTG 36 TP53-E7bCATGTAGTTGTAG 78 TAACAGTTCCTG TGGATGGTGG C TP53 TP53-E8a GTGGAGTATTTG 37TP53-E8b ATACTCCACACGC 79 GATGACAGAAAC AAATTTCC TP53 TP53-E9aTGCTCAGATAGC 38 TP53-E9b CTATCTGAGCAGC 80 GATGGTGAG GCTCATG TP53TP53-E10a CTGTGCAGCTGT 39 TP53-E10b GCAGGTCTTGGCC 81 GGGTTGA AGTTG TP53TP53-E12a AAGTCTGTGACT 40 TP53-E12b TGTCCCAGAATGC 82 TGCACGGTC AAGAAGCTP53 TP53-E13a CCTGTCATCTTC 41 TP53-E13b GATGACAGGGGCC 83 TGTCCCTTCAGGAG TP53 TP53-E14a AAGACCCAGGTC 42 TP53-E14b TGGGTCTTCAGTG 84CAGATGAAG AACCATTG TP53 TP53-E15a CTGCTCTTGTCT 43 TP53-E15bGAGCAGAAAGTCA 85 TTCAGACTTCC GTCCCATG TP53 TP53-E16a CTCTGAGTCAGG 44TP53-E16b GCTCGACGCTAGG 86 AAACATTTTCAG ATCTGAC

TABLE 5 Gene B2B SEQ SEQ Name primerA PrimerA Sequence ID NO B2B primerBPrimerB Sequence ID NO PIK3CA PIK3CA- GTTCAGAGTTCTAC  87 PIK3CA-BX1bCCTTGGCACCCGAG 129 BX1a AGTCCGACGATCGC AATTCCAAAAGCAA TTTGAGCTGTTCTTTTTCTACACGAGAT TGTCATT CC PIK3CA PIK3CA- GTTCAGAGTTCTAC  88 PIK3CA-BX2bCCTTGGCACCCGAG 130 BX2a AGTCCGACGATCTT AATTCCAATTAAAC TAATTGTGTGGAAGAGCATGCATTGAAC ATCCAATC TG EGFR EGFR- GTTCAGAGTTCTAC  89 EGFR-BX1bCCTTGGCACCCGAG 131 BX1a AGTCCGACGATCCT AATTCCAAAATTCC TTCTCACCTTCTGGCGTCGCTATCAAG GATCC EGFR EGFR- GTTCAGAGTTCTAC  90 EGFR-BX2bCCTTGGCACCCGAG 132 BX2a AGTCCGACGATCCC AATTCCAATGGCCA ATCACGTAGGCTTCGCGTGGACAAC CTG EGFR EGFR- GTTCAGAGTTCTAC  91 EGFR-BX3b CCTTGGCACCCGAG133 BX3a AGTCCGACGATCGA AATTCCATGTCCGG CATAGTCCAGGAGG GAACACAAAGAC CAGCEGFR EGFR- GTTCAGAGTTCTAC  92 EGFR-BX4b CCTTGGCACCCGAG 134 BX4aAGTCCGACGATCAA AATTCCATGGCAGC GCGACGGTCCTCCA CAGGAACGTAC AG EGFR EGFR-GTTCAGAGTTCTAC  93 EGFR-BX5b CCTTGGCACCCGAG 135 BX5a AGTCCGACGATCAGAATTCCAAACACCG TACGTTCCTGGCTG CAGCATGTCAAG CC EGFR EGFR- GTTCAGAGTTCTAC 94 EGFR-BX6b CCTTGGCACCCGAG 136 BX6a AGTCCGACGATCAT AATTCCAAAGTGGACCACTTGATAGGCA TGGCATTGGAATC CCTTG EGFR EGFR- GTTCAGAGTTCTAC  95EGFR-BX7b CCTTGGCACCCGAG 137 BX7a AGTCCGACGATCTC AATTCCACCTGGAGTCGCTGGCAGGGAT AAAGGAGAACGC TC EGFR EGFR- GTTCAGAGTTCTAC  96 EGFR-BX9bCCTTGGCACCCGAG 138 BX9a AGTCCGACGATCAA AATTCCAAGTTCCG CTTTGGGCGACTATTGAGTTGATCATCG CTGC EGFR EGFR- GTTCAGAGTTCTAC  97 EGFR-BX10bCCTTGGCACCCGAG 139 BX10a AGTCCGACGATCTT AATTCCAACTTCTA GGAGTCTGTAGGACCCGTGCCCTGATG TTGGC EGFR EGFR- GTTCAGAGTTCTAC  98 EGFR-BX11bCCTTGGCACCCGAG 140 BX11a AGTCCGACGATCCT AATTCCACACAGCA GCTGTGGGATGAGGGGGCTTCTTCAG TACTC EGFR EGFR- GTTCAGAGTTCTAC  99 EGFR-BX12bCCTTGGCACCCGAG 141 BX12a AGTCCGACGATCCA AATTCCACATGGGC TGGAATGCTTGTACAACTTCTCTGTTTC CACATC BRAF BRAF- GTTCAGAGTTCTAC 100 BRAF-BX1bCCTTGGCACCCGAG 142 BX1a AGTCCGACGATCCA AATTCCAAAACTGA GTTTGAACAGTTGTTGGGACCCACTCC CTGGATC PTEN PTEN- GTTCAGAGTTCTAC 101 PTEN-BX1bCCTTGGCACCCGAG 143 BX1a AGTCCGACGATCTG AATTCCAAGGAGAT TTTCTGCTAACGATATCAAGAGGATGGA CTCTTTG TTC PTEN PTEN- GTTCAGAGTTCTAC 102 PTEN-BX2bCCTTGGCACCCGAG 144 BX2a AGTCCGACGATCCA AATTCCATCCTGCA GGAAATCCCATAGCGAAAGACTTGAAGG AATAATG PTEN PTEN- GTTCAGAGTTCTAC 103 PTEN-BX3bCCTTGGCACCCGAG 145 BX3a AGTCCGACGATCGC AATTCCAGGATTCA TTTGAATCCAAAAAAAGCATAAAAACCA CCTTAAAAC TTAC PTEN PTEN- GTTCAGAGTTCTAC 104 PTEN-BX4bCCTTGGCACCCGAG 146 BX4a AGTCCGACGATCTA AATTCCATATGTTG CAGTACATTCATACTATAACTTAAACCC CTACCTCTGC GATAGAC PTEN PTEN- GTTCAGAGTTCTAC 105PTEN-BX5b CCTTGGCACCCGAG 147 BX5a AGTCCGACGATCAA AATTCCATTGAAGAAGGATATTGTGCAA CCATAACCCACCAC CTGTGG PTEN PTEN- GTTCAGAGTTCTAC 106PTEN-BX6b CCTTGGCACCCGAG 148 BX6a AGTCCGACGATCCC AATTCCAAAGTAAGATAGAAATCTAGGG GACCAGAGACAAAA CCTCTTG AGG PTEN PTEN- GTTCAGAGTTCTAC 107PTEN-BX7b CCTTGGCACCCGAG 149 BX7a AGTCCGACGATCCC AATTCCAGGATTATAGATGATTCTTTAA AGACCAGTGGCACT CAGGTAGC G PTEN PTEN- GTTCAGAGTTCTAC 108PTEN-BX9b CCTTGGCACCCGAG 150 BX9a AGTCCGACGATCGA AATTCCACATGTACACTTGTCTTCCCGT TTTGAGTTCCCTCA CGTG GC PTEN PTEN- GTTCAGAGTTCTAC 109PTEN-BX12b CCTTGGCACCCGAG 151 BX12a AGTCCGACGATCTC AATTCCACAGGACCTGGTCCTGGTATGA AGAGGAAACCTCAG AGAATG PTEN PTEN- GTTCAGAGTTCTAC 110PTEN-BX13b CCTTGGCACCCGAG 152 BX13a AGTCCGACGATCGC AATTCCACGTGCAGTCTATACTGCAAAT ATAATGACAAGGA GCTATCG ATATC PTEN PTEN- GTTCAGAGTTCTAC 111PTEN-BX14b CCTTGGCACCCGAG 153 BX14a AGTCCGACGATCTT AATTCCAGGTCAGTGGAGAAAAGTATCG TAAATTAAACATTT GTTGG TGTGG PTEN PTEN- GTTCAGAGTTCTAC 112PTEN-BX15b CCTTGGCACCCGAG 154 BX15a AGTCCGACGATCTG AATTCCAGATGTTAGTGTTACAGAAGTT GTGACAATGAACCT GAACTGC GATC KRAS KRAS- GTTCAGAGTTCTAC 113KRAS-BX1b CCTTGGCACCCGAG 155 BX1a AGTCCGACGATCAA AATTCCATCTTGCCGAGTGCCTTGACGA TACGCCACCAG TACAGC TP53 TP53- GTTCAGAGTTCTAC 114TP53-BX1b CCTTGGCACCCGAG 156 BX1a AGTCCGACGATCCC AATTCCACAGGCCCTGACTCAGACTGAC TTCTGTCTTGAAC ATTCTCC TP53 TP53- GTTCAGAGTTCTAC 115TP53-BX2b CCTTGGCACCCGAG 157 BX2a AGTCCGACGATCAT AATTCCAGAACATCGTTCCGAGAGCTGA TCGAAGCGCTCAC ATGAG TP53 TP53- GTTCAGAGTTCTAC 116TP53-BX3b CCTTGGCACCCGAG 158 BX3a AGTCCGACGATCTT AATTCCATTATGGTAAAGGACCAGACCA ATAAGTTGGTGTTC GCTTTC TGAAG TP53 TP53- GTTCAGAGTTCTAC 117TP53-BX4b CCTTGGCACCCGAG 159 BX4a AGTCCGACGATCCT AATTCCAAGAGGTCTGGGACCTCTTATC CCAAGACTTAGTAC AAGTGG CTG TP53 TP53- GTTCAGAGTTCTAC 118TP53-BX5b CCTTGGCACCCGAG 160 BX5a AGTCCGACGATCAA AATTCCAGCTTGCTGCAAGCAGGACAAG TACCTCGCTTAGTG AAGC TP53 TP53- GTTCAGAGTTCTAC 119TP53-BX6b CCTTGGCACCCGAG 161 BX6a AGTCCGACGATCGG AATTCCATTCCGTCGACGGAACAGCTTT CCAGTAGATTACCA GAG C TP53 TP53- GTTCAGAGTTCTAC 120TP53-BX7b CCTTGGCACCCGAG 162 BX7a AGTCCGACGATCCA AATTCCACATGTAGACTACATGTGTAAC TTGTAGTGGATGGT AGTTCCTGC GG TP53 TP53- GTTCAGAGTTCTAC 121TP53-BX8b CCTTGGCACCCGAG 163 BX8a AGTCCGACGATCGT AATTCCAATACTCCGGAGTATTTGGATG ACACGCAAATTTCC ACAGAAAC TP53 TP53- GTTCAGAGTTCTAC 122TP53-BX9b CCTTGGCACCCGAG 164 BX9a AGTCCGACGATCTG AATTCCACTATCTGCTCAGATAGCGATG AGCAGCGCTCATG GTGAG TP53 TP53- GTTCAGAGTTCTAC 123TP53-BX10b CCTTGGCACCCGAG 165 BX10a AGTCCGACGATCCT AATTCCAGCAGGTCGTGCAGCTGTGGGT TTGGCCAGTTG TGA TP53 TP53- GTTCAGAGTTCTAC 124 TP53-BX12bCCTTGGCACCCGAG 166 BX12a AGTCCGACGATCAA AATTCCATGTCCCA GTCTGTGACTTGCAGAATGCAAGAAGC CGGTC TP53 TP53- GTTCAGAGTTCTAC 125 TP53-BX13bCCTTGGCACCCGAG 167 BX13a AGTCCGACGATCCC AATTCCAGATGACA TGTCATCTTCTGTCGGGGCCAGGAG CCTTC TP53 TP53- GTTCAGAGTTCTAC 126 TP53-BX14bCCTTGGCACCCGAG 168 BX14a AGTCCGACGATCAA AATTCCATGGGTCT GACCCAGGTCCAGATCAGTGAACCATTG TGAAG TP53 TP53- GTTCAGAGTTCTAC 127 TP53-BX15bCCTTGGCACCCGAG 169 BX15a AGTCCGACGATCCT AATTCCAGAGCAGA GCTCTTGTCTTTCAAAGTCAGTCCCATG GACTTCC TP53 TP53- GTTCAGAGTTCTAC 128 TP53-BX16bCCTTGGCACCCGAG 170 BX16a AGTCCGACGATCCT AATTCCAGCTCGAC CTGAGTCAGGAAACGCTAGGATCTGAC ATTTTCAG

TABLE 6 Results reads 1.5M % target base w/1x 97.8% % on target 63.4% %duplicates 18.2% mean coverage 74.5x s.d of coverage 0.21

Example 3: Fragmentation of Genomic DNA for Sequencing LibraryConstruction

1 μL of genomic DNA was processed using a NEBNext dsDNA Fragmentase kit(New England Biolabs) by following the manufacturer's protocol.Incubation time was extended to 45 minutes at 37° C. The fragmentationreaction was stopped by adding 5 μL of 0.5M EDTA pH 8.0, and waspurified by adding 2× volumes of Ampure XP beads (Beckman Coulter,A63881) according to the manufacturer's protocol. Fragmented DNA wasanalyzed on a Bioanalyzer with a High Sensitivity DNA kit (Agilent). Thesize range of fragmented DNA was typically from about 100 bp to about200 bp with a peak of about 150 bp.

Example 4: Library Preparation Procedures

In this example, a KAPA Library Prep Kit (KK8230) was used forillustration purposes.

For steps involving bead purification, AMPure XP Beads (cat #A63881)were equilibrated to room temperature and thoroughly resuspended beforemixing with the sample. After mixing thoroughly with the sample on avortex mixer, it was incubated at room temperature for 15 minutes toallow DNA to bind to the beads. Beads were then put over a magneticstand until the liquid was clear. The beads were then washed twice with200 ul 80% ethanol and dried at room temperature for 15 minutes.

For performing the end-repair reaction, up to 50 μL (2-10 ng) ofcell-free DNA was mixed with 20 μL of end repair master mix (8 μL water,7 μL 10×KAPA end repair buffer, and 5 μL KAPA end repair enzyme mix),and incubated for 30 minutes at 20° C. 120 μL of AMPure XP Beads werethen added to the 70 μL end repair reaction. The sample was thenpurified as above.

For performing A-tailing reactions, dried beads containing theend-repaired DNA fragment were mixed with A-tailing master mix (42 μLwater, 5 μL 10×KAPA A tailing buffer, and KAPA A-tailing enzyme). Thereaction was incubated at 30° C. for 30 minutes. After adding 90 μL ofPEG solution (20% PEG 8000, 2.5M NaCl), the mixture was washed accordingthe bead purification protocol above. This A-tailing step was skippedfor blunt-end ligation reactions.

For linker ligation, two oligos having the following sequences (5′ to3′) were used to form an adapter polynucleotide duplex:/5Phos/CCATTTCATTACCTCTTTCTCCGCACCCGACATAGAT*T (SEQ ID NO: 171) and/5Phos/ATCTATGTCGGGTGCGGAGAAAGAGGTAATGAAATGG*T (SEQ ID NO: 172). Thedried beads containing end-repaired (for blunt ligation) or a-tailed(for linker-based ligation) was mixed with 45 μL of ligation master mix(30 μL water, 10 μL 5×KAPA ligation buffer, and 5 μL KAPA T4 DNAligase), and 5 μL water (for blunt end ligation) or 5 μL of an equalmolar mix of linker oligonucleotides (for linker-based ligation). Thebeads were thoroughly resuspended, and incubated at 20° C. for 15minutes. After adding 50 μL of PEG solution (see above), the mixture waswashed according to the above bead purification protocol.

Multiple displacement amplification (MDA) was performed using IllustraGenomiphi V2 DNA Amplification Kits. The dried beads containing theligated fragment chain were resuspended in 9 μL of randomhexamer-containing buffer and heated to 3 minutes at 95° C., followed byrapid cooling on ice. After adding 1 μL of enzyme mix, the cooled samplewas incubated at 30° C. for 90 minutes. The reaction was then stopped byheating at 65° C. for 10 minutes. After adding 30 μL of PEG solution(see above), the mixture was washed according to the purificationprotocol described above, and resuspended in 200 μL TE (with anincubation at 65° C. for 5 minutes). If desired, the purified productcould be quantitated with quantitative PCR, digital droplet PCR (ddPCR),or put forward to next generation sequencing (NGS).

After MDA, long ligated fragment chains (e.g. ≥2 kb) were sonicated to˜300 bp using a Covaris 5220 in 130 μL total volume. The manufacturer'sprotocol indicated 140 W peak power, 10% duty factor, 200 cycles perburst, and 80 seconds of treatment time. The fragment length of ˜300 bpwas selected to increase the chance of keeping an intact original cellfree DNA fragment. A standard library preparation protocol can be usedto put adaptors on sonicated DNA fragments for sequencing if desired. Avariety of read compositions were returned from pair-end sequencing runson Illumina sequencers (either HiSeq or MiSeq). Those in which thejunction (either self-junction, or adapter junction in the case whereadapters were included in the ligation step) was internal to the read(flanked 5′ and 3′ by non-adapter sequence) were used to barcodesequences of interest.

Example 5: Circularization and Amplification

This provides an example illustration of a circularization andamplification procedure according to methods herein. The procedure usedthe following supplies: PCR Machine (e.g. MJ research PTC-200 Peltierthermal cycler); Circligase II, ssDNA ligase Epicentre cat #CL9025K;Exonuclease (e.g. ExoI, NEB Biolabs cat #M0293S; ExoIII, NEB biolabs cat#M0206S); T4 Polynucleotide Kinase (NEB Biolab cat #M0201S); WholeGenome Amplification kit (e.g. GE Healthcare, Illustra, Ready-To-Go,Genomiphi, V3 DNA amplification kit); GlycoBlue (e.g. Ambion cat#AM9515); Micro centrifuge (e.g. Eppendrof 5415D); DNA purificationbeads (e.g. Agencourt, AMpure XP, Beckman Coulter cat #A63881); Magneticstand (e.g. The MagnaRack™ Invitrogen cat #CS15000); Qubit® 2.0Fluorometer (Invitrogen, cat #Q32866); molecular probes ds DNA HS assaykit (Life Technology cat #032854); and a Bioanalyzer (Agilent 2100), andhigh sensitivity DNA reagents (cat #5067-4626).

For amplification of DNA fragments lacking a 5′ terminal phosphate (e.g.cell-free DNA), the first step was end-repair and formation of singlestrands. DNA was denatured at 96° C. for 30 seconds (e.g. on a PCRmachine). A polynucleotide kinase (PNK) reaction was prepared bycombining 40 μL of DNA and 5 μL 10×PNK reaction buffer, followed byincubation at 37° C. for 30 minutes. 1 mM ATP and the PNK enzyme wereadded to the reaction, and incubated for 37° C. for 45 minutes. A bufferexchange was conducted by precipitating and resuspending the DNA. 50 μLDNA from the PNK reaction, 5 μL sodium acetate 0.5M pH 5.2, 1 μLGlycoBlue, 1 μL oligo (100 ng/μL), and 150 μL 100% ethanol werecombined. The mixture was incubated at −80° C. for 30 minutes, andcentrifuged at 16k rpm for 5 minutes to pellet the DNA. The DNA pelletwas washed with 500 μL of 70% ethanol, air dried for 5 minutes at roomtemperature, and DNA was suspended in 12 μL 10 mM Tris-Cl pH 8.0.

Resuspended DNA was then circularized by ligation. The DNA was denaturedat 96° C. for 30 seconds, the sample was chilled on ice for 2 minutes,and ligase mix (2 μL 10× CircLigase buffer, 4 μL 5M Betaine, 1 μL 50 mMMnCl₂, 1 μL CircLigase II) was added. The ligation reaction wasincubated at 60° C. for 16 hours on a PCR machine. Unligatedpolynucleotides were degraded by exonuclease digestion. For this, DNAwas denatured at 80° C. for 45 seconds, and 1 μL Exo nuclease mix (ExoI20 U/μL: ExoIII 100 U/μL=1:2) was added to each tube. This was mixed bypipetting up and down 5 times, and spun briefly. The digestion mix wasincubated at 37° C. for 45 minutes. The volume was brought to 50 μL with30 μL of water, and a further buffer exchange was conducted byprecipitation and resuspension as above.

For conducting whole genome amplification (WGA), purified DNA was firstdenatured at 65° C. for 5 minutes. 10 μL of denature buffer from GE WGAkit was added to 10 μL of purified DNA. The DNA was cooled on a coolblock or ice for 2 minutes. 20 μL of DNA was added to the Ready-To-GoGenomiPhi V3 cake (WGA). The WGA reaction was incubated at 30° C. for1.5 hours, followed by heat inactivation at 65° C. for 10 minutes.

The sample was purified using AmpureXP magnetic beads (1.6×). The beadswere vortexed, and 80 μL aliquoted in 1.5 mL tubes. 30 μL water, 20 μLamplified DNA, and the 80 μL of beads were then combined, and incubatedat room temperature for 3 minutes. The tubes were placed on a magneticstand for 2 minutes, and the clear solution was pipetted out. The beadswere washed twice with 80% ethanol. DNA was eluted by adding 200 μL of10 mM Tris-Cl pH 8.0. The DNA bead mixture was incubated at 65° C. for 5minutes. The tubes were placed back on the magnetic stand for 2 minutes.195 μL of DNA was transferred to anew tube. 1 μL was used forquantification by Qubit. Finally, 130 μL WGA product was sonicated usingCovaris S220 to reach a size of around 400 bp.

Example 6: Analysis of Ligation Efficiency and On-Target Rates

cfDNA that was circularized and subjected to whole genome application asin the above examples was analyzed by quantitative PCR (qPCR). The qPCRamplification curve results for a sample target (using KRAS primers) areshown in FIGS. 18A and 18B. As shown in FIG. 18A, qPCR amplification of1/10^(th) of input cfDNA gave an average Ct (cycle threshold) of 31.75,and 1/10^(th) of the same sample's ligation product gave an average Ctof 31.927, indicating a high ligation efficiency of about 88%. Ligationefficiency may range from about or more than about 70%, 80%, 90%, 95%,or more, such as about 100%. The linear DNA that was not circularized isremoved in some examples, such that about all DNA can be amplified fromcircular forms. Each sample was run three times, in duplicate. As shownin FIG. 18B, the amplification curves of 10 ng of WGA product andreference genomic DNA (gDNA) (12878, 10 ng) virtually overlap with eachother. The average Ct for the WGA sample was 26.655, while that of thegDNA sample was 26.605, indicating a high on-target rate of over 96%.The number of KRAS in a given amount of amplified DNA was comparablewith the un-amplified gDNA, indicating an unbiased amplificationprocess. Each sample was tested three times, in duplicate. As a point ofcontrast, the circularization protocol provided in Lou et al. (PNAS,2013, 110 (49)) was also tested. Using the Lou method, which lacked theprecipitation and purification steps of the examples described above,only 10-30% of linear input DNA was converted to circular DNA. Such lowrecovery presents a challenge to downstream sequencing and variantdetection.

Example 7: Analysis of Amplified Circularized DNA by ddPCR

Droplet digital PCR (ddPCR) was used to assess allele frequencypreservation and bias in whole genome amplification products generatedfrom circularized polynucleotides. In general, ddPCR refers to a digitalPCR assay that measures absolute quantities by counting nucleic acidmolecules encapsulated in discrete, volumetrically defined, water-in-oildroplet partitions that support PCR amplification (Hinson et al, 2011,Anal. Chem. 83:8604-8610; Pinheiro et al, 2012, Anal. Chem. 84:1003-1011). A single ddPCR reaction may be comprised of at least 20,000partitioned droplets per well. Droplet digital PCR may be performedusing any platform that performs a digital PCR assay that measuresabsolute quantities by counting nucleic acid molecules encapsulated indiscrete, volumetrically defined, water-in-oil droplet partitions thatsupport PCR amplification. A typical strategy for droplet digital PCRmay be summarized as follows: a sample is diluted and partitioned intothousands to millions of separate reaction chambers (water-in-oildroplets) so that each contains one or no copies of the nucleic acidmolecule of interest. The number of “positive” droplets detected, whichcontain the target amplicon (i.e., nucleic acid molecule of interest),versus the number of “negative” droplets, which do not contain thetarget amplicon (nucleic acid molecule of interest), may be used todetermine the number of copies of the nucleic acid molecule of interestthat were in the original sample. Examples of droplet digital PCRsystems include the QX100™ Droplet Digital PCR System by Bio-Rad, whichpartitions samples containing nucleic acid template into 20,000nanoliter-sized droplets; and the RainDrop™ digital PCR system byRainDance, which partitions samples containing nucleic acid templateinto 1,000,000 to 10,000,000 pico liter-sized droplets. Additionalexamples of methods for ddPCR are provided in WO2013181276A1.

In this example, BRAF V600E genomic DNA (gDNA) from a melanoma cell linewas mixed in with reference genome DNA 12878 at specific proportions(0%, 0.67%, 2.0%, 6.67%, 20%, or 100%), and fragmented to generatefragments of a size resembling those found in cfDNA (in this case, about150 bp). The mixed DNA samples (10 ng) were circularized and amplifiedaccording to Example 2. 40 ng of amplified DNA was subjected to ddPCRfor BRAF V600E and wild type. The observed mutation allele frequenciesare illustrated graphically and tabulated in FIG. 19 . As shown, theobserved mutation allele frequency with amplification (middle row ofFIG. 19 table) reflects the input mutant allele frequency (top row), aswell as the ddP CR result from 100 ng of genomic DNA withoutamplification (bottom row). The allele frequency by ddPCR output iscalculated as the number of BRAF mutation containing droplets over thesum of both mutant and wild type containing droplets. DNA withamplification is indicated as an open circle, and without amplificationis indicated as a small filled circle. With the exception of a smalldeviation at 0.67%, the two data sets overlap completely. Thisdemonstrates preservation of true representation of the mutant allelefrequency, substantially without bias.

Example 8: Detection of Sequence Variants Above Background

10 ng of sonicated gDNA (150 bp, Multi-Gene Multiplex reference DNA,Horizon) was circularized and amplified as described in Example 2, andfollowed by sonication. Fragmented DNA was then subjected to Rubiconsequencing library construction. After capture sequencing, variantswithin 50 bp from reference hotspots were plotted. Results for variantdetection, where calling a variant required detection in two differentpolynucleotides distinguished by different junctions, are shown in FIG.20 . The seven expected reference hotspots (KIT D816V, EGFR G719S, EGFRT790M, EGFR L858R, KRAS G13D, KRAS G12D, NRAS Q61K) are plotted atposition 0. Two other variants were also confirmed, illustrated as theopen triangle and diamond in FIG. 20 .

For comparison, gDNA was sonicated as above, but 10 ng of the sonicatedgDNA was directly subjected to Rubicon sequencing library constructionaccording to common practice, without circularization and withoutrequiring confirmation of a sequence variants on two differentpolynucleotides. After capture sequencing, variants within 50 bp fromreference hotspots were again plotted, with results in FIG. 21 . Theseven expected reference hotspots (KIT D816V, EGFR G719S, EGFR T790M,EGFR L858R, KRAS G13D, KRAS G12D, NRAS Q61K) are plotted at position 0.The variants at other positions were not expected, and are most likelydue to sequencing errors. By contrast with results of the methodemployed in generating FIG. 20 , the results in FIG. 21 indicate thatstandard sequencing methods have a much higher random error rate thatcan mask true mutation signal when allele frequency is low (such asbelow 5%).

Results of a separate analysis of sensitivity and background noisedetected by sequencing methods with and without requiring detection intwo different polynucleotides are illustrated in FIGS. 16-17 . As thesefigures illustrate, the validation requirement greatly reducesbackground noise and increases sensitivity.

Example 9: Analysis of GC Composition and Size Distribution

10 ng of sonicated gDNA (150 bp, Multi-Gene Multiplex reference DNA,Horizon) was amplified circularized and amplified as in Example 5,sequenced, and analyzed with the variant-calling two-polynucleotideverification filter (left). The number of sequences with a range of CGpercentages were tabulated and plotted graphically, as shown in FIG. 22. As shown in the far left plot, sequences for samples preparedaccording to Example 5 largely resemble the theoretical distributionexcept the central peak (corresponds to the overall GC content of theunderlying genome). By contrast, when the same amount of gDNA was useddirectly to construct a sequencing library without amplification using aRubicon sequencing library construction kit, the difference between thesequencing result and theoretical distribution is very apparent (see themiddle plot). The central peak of this direct Rubicon sequencing ishigher than the theoretical distribution. Newman et al. (2014; NatureMedicine, (20):548-54) reported that the cfDNA sequencing GC contentdistribution was similar with theoretical distribution when 32 ng ofcfDNA was used. This is illustrated in the far right plot.

DNA size distribution was assessed for cfDNA that had been circularized,amplified, and sequenced as in Example 5. As shown in FIG. 23 , the peakof the distribution of fragment lengths indicated by the sequencingresults is at about 150-180 bp, which resembles the typical distributionpattern of cfDNA.

Example 10: Assessment of Amplification Uniformity

The qPCR results of 10 products circularized and amplified according toExample 5 were compared to unamplified reference DNA (gDNA from 12878cell line, Coriell Institute). 10 ng of genomic reference DNA oramplification product were used for each real-time qPCR reaction, andratios were generated by relative quantification of amplificationproduct over genomic reference. As shown in FIG. 24 , the ratio of eachPCR is within a 2-fold change, suggesting that the copy number of thesetargets in the amplified DNA pool are very similar to the un-amplifiedreference DNA. The 10 pairs of PCR primers from 6 genes (BRAF, cKIT,EGFR, KRAS, NRAS, PI3KCA) were designed and previously validated.

Example 11: Quantification of Amplification Yield of DNA Fragments

cfDNA was isolated from four patients (patient 1-4) and one healthycontrol. Genomic DNA (gDNA, Multi-Gene Multiplex Horizon) was sonicatedto approximately 150 bp fragments. DNA was circularized and amplifiedwith random primers. Table 7 shows the amount of DNA input into theamplification reaction, and the amount of DNA produced by amplification.Significant amplification was obtained for even the smallest sample (0.4ng), and all samples were amplified at least 600-fold.

TABLE 7 Sample Type Input (ng) Yield (ng) gDNA 10 6100 gDNA 4 6880 gDNA2 5700 gDNA 1 5760 gDNA 0.5 3280 cfDNA healthy control 0.4 6240 cfDNApatient1 4 13800 cfDNA patient2 3 9480 cfDNA patient3 3 7840 cfDNApatient4 1 3180

Example 12: Detecting Low-Frequency Mutations from cfDNA of CancerPatients

In step 1, cfDNA was circularized. The circle ligation mix was preparedin a PCR tube at room temperature. 4 ng-10 ng cfDNA were pipetted in a12 μl of volume to the PCR tube. DNA was denatured at 96° C. for 30seconds, then PCR tubes were chilled on ice for 2 minutes. Ligation mix(2 μl of 10× CircLigase buffer, 4 μl 5 M betaine, 1 μl 50 mM MnCl2, 1 μlCircLigase II) was added to each tube, and the reaction proceeded at 60°C. for 16 hours on a PCR machine.

In step 2, the ligation reactions were treated to remove unligatedlinear DNA. 1 μl Exonuclease mix (NEB M0206S, M0293S; ExoI 20u/μl:ExoIII 100 u/μl=1:2) was added to each tube, mixed, and incubatedat 37° C. for 30 minutes in PCR machine.

In step 3, the ligation reaction was purified for buffer exchange. Theligation product was purified with Oligo Clean & Concentrator (ZymoResearch). Binding mix (30 μl of 10 mM Tris, 100 μl of Oligo bindingbuffer, 400 μl of 100% Ethanol) was added to the ligation reaction afterExonuclease treatment, mixed, and briefly spun down. Zymo-spin columnswere loaded, and spun at greater than 10,000×g for 30 seconds. Columnswere washed with 750 μl of DNA wash buffer, and centrifuged at 14,000×gfor 1 minute. DNA was eluted with 15 μl of 10 mM Tris by centrifugationat greater than 10,000×g for 30 seconds.

In step 4, the DNA was amplified by random priming. Whole genomeamplification (WGA) was performed with Ready-To-Go Genomiphi V3 DNAamplification Kit (GE Healthcare). 10 μl of purified ligation was mixedwith 10 μl of 2× denaturation buffer, incubated at 95° C. for 3 minutes,then cooled to 4° C. on ice. 20 μl of denatured DNA was added to WGApre-mix, samples were incubated at 30° C. for 1.5 hours followed byinactivation at 65° C. for 10 minutes.

In step 5, the amplification products were cleaned up using AgencourtAMPure XP Purification (1.6×) (Beckman Coulter). 30 μl of 10 mM Tris and80 μl of AMpure beads were added to 20 μl of WGA reaction. The mixturewas incubated at room temperature for 2 minutes. The tubes were placedon a magnet stand, and incubated for 2 minutes. The supernatants wereremoved and discarded. Samples were washed with 200 μl of ethanol (80%)twice, air dried for 5 minutes, and DNA eluted with 200 μl of 10 mM TrispH 8.0.

In step 6, WGA DNA was fragmented. 130 μl WGA product was sonicatedusing a Covaris 5220 sonicator to obtain a fragment size ofapproximately 400 bp. Covaris 5220 settings were as follows: peakincident power=140 W, duty factor=10%, cycles per burst=200, treatmenttime=55 seconds.

In step 7, samples were quantified by qPCR. 1/10 of the ligation inputand ligation product were used for qPCR reactions with threereplications to measure ligation efficiency. 10 ng of the fragmented WGAproduct along with 10 ng of reference gDNA (12878 cell line) were usedfor qPCR to measure on target rate. Reactions comprised 5 μl of 2×master mix (TaqMan Fast Universal PCR master mix (2×), AppliedBiosystems; Evagreen dye, Biotium), 0.5 μl of primer (5 μM), 1.2 μl ofH2O, 10 μl of DNA. Amplification proceeded according to the followingprogram: 95° C. 2 minutes; and 40 cycles of [95° C., 10 seconds; 60° C.,20 seconds].

In step 8, sequencing libraries were constructed. Sequencing librarieswere prepared from 500-1000 ng of sonicated amplified DNA using KAPAHyper Prep Kit (KK8500) or KAPA Library Preparation Kit with StandardPCR Library (KK8200). Adaptor ligations (with 1 uM adaptor finalconcentration) were prepared according to manufacturer's protocol.Adaptor ligated wash of the ligated product, 30 μl (0.3×) of 20% PEG8000/2.5M NaCl solution was added to 100 μl of the resuspended ligatedproduct. Beads were mixed thoroughly with the ligated product andincubated at room temperature for 15 minutes. Beads were then capturedon a magnet until the liquid was clear. 130 μl of supernatant was thensubjected to size selection using Ampure XP beads. Samples weretransferred to anew plate followed by an addition of 20 μl of Ampure XPbeads (0.5×). Ligated product was now captured in the beads and washedtwo times with 200 μl 80% ethanol. Ligated product was then resuspendedand eluted in 20 μl EB buffer. After size selection and purification, 20μl ligated product was added to 25 μl 2×KAPA HiFi Hotstart ready mix and5 μl 10 μM P5+P7 primers (5′CAAGCAGAAGACGGCATACGA3′ (SEQ ID NO: 1),5′ACACTCITTCCCTACACGACGCTCTTCCGATCT3′ (SEQ ID NO: 2)) to amplify thelibrary using the following cycling program: 98° C., 45 seconds; 5cycles of (98° C., 15 seconds; 60° C., 30 seconds; 72° C., 30 seconds);72° C., 60 seconds. Amplified library was diluted 20× before loading ona fragment analyzer or bioanalyzer (high sensitivity chip) forquantitation. Further size selection was done via gel size selector(Blue Pippin prep from Sage Science).

In step 9, the sequencing library was enriched by probe captureenrichment using probes from xGEN Pan-Cancer Panel v1.5, 127908597(IDT). In step 10, the library was sequenced in a HiSeq 2500, with anaverage depth of 1000×.

In step 11, sequencing data was analyzed to make variant calls. Variantcalling included a step requiring that a sequence difference occur ontwo different polynucleotides (e.g. identified by different junctions)to be counted as a variant. Several somatic mutations were detected andthey were also reported in a public databases (COSMIC (Catalog ofSomatic Mutations in Cancer)). Among the mutations identified was BRAFV600M with a 0.05% allele frequency, which demonstrate the highsensitivity of this system even when the input is low. Results for thedetection of various mutations, including their frequency in the sample,are shown in Table 8.

TABLE 8 Frequency Cancer type Mutated gene Mutation in sample MelanomaFGFR2 V191I 0.20% Melanoma NRAS Q61K 0.20% Melanoma CTNNB1 S37F 0.10%Melanoma BRAF V600E 0.10% Melanoma BRAF V600M 0.05% Breast cancer PIK3CAH1407L  53% Breast cancer LIFR S679L 0.60% Breast cancer KRAS G12D 0.60%Breast cancer ATM R2993* 0.30% Breast cancer ATRX S618F 0.10% ProstateAR T346A 6.40% Prostate SPOP F133L 3.30% Prostate BRAF V600M 1.10%Prostate NSD1 R1557C 1.00% Prostate SF3B1 K700E 1.60% Pancreatic KRASG12V 1.70% Pancreatic TP53 7578176C > T  0.5% (splice donor variant)

Example 13: Accurate Mutation Detection from Multiplex Reference DNAfrom FFPE Sample

DNA was extracted from sample Horizon FFPE-multiplex (HD200) byfollowing the manufacturer's protocol (Covaris truXTRAC™ FFPE DNA Kit).130 μl FFPE gDNA was sonicated using a Covaris 5220 sonicator to obtaina fragment size of approximately 150 bp (Covaris 5220 settings: peakincident power=175 W, duty factor=10%, cycles per burst=200, treatmenttime=430 seconds). 50 ng DNA in 11 μl volume was denatured at 95° C. for30 seconds. 10.5 μl H₂O, 2.5 μl ligase buffer (NEB B0202S), and 1 μl T4Polynucleotide Kinase (NEB M0201S) were added. Reactions were incubatedat 37° C. for 30 minutes for phosphorylate.

Samples were ligated, then purified with Oligo Clean & Concentrator(Zymo Research). Binding mix (30 μl of 10 mM Tris, 100 μl of Oligobinding buffer, 400 μl of 100% Ethanol) was added to the ligationreaction after Exonuclase treatment, mixed by vortex, and spun briefly.Samples were load on a Zymo-spin column, and spun at greater than10,000×g for 30 seconds. Columns were washed with 750 μl of DNA washbuffer, and centrifuged at 14,000×g for 1 minute. DNA was eluted with 15μl of 10 mM Tris by centrifugation at greater than 10,000×g for 30seconds.

Samples were further processed and analyzed according to steps 5-11 inExample 13. Results are summarized in Table 9. The representation ofnine mutations in Horizon's multiple mutation standard DNA were roughlyretained by this process, while the quantity of DNA increased at least600-fold.

TABLE 9 Measured Gene Mutations Allele Frequency Allele Frequency BRAFp.V600E 10.5%  8% KIT p.D816V  10% 9.4% EGFR p.L858R   3%  4% EGFRp.T790M   1%  1% EGFR p.G719S 24.5%  14% KRAS p.G13D  15% 9.1% KRASp.G12D   6% 5.4% NRAS p.Q61K 12.5% 8.3% PIK3CA p.H1047R 17.5%  14%

Example 14: Detecting Low-Frequency Mutation from Cancer Mutation CellLine gDNA Multimers

In this example, sonicated genomic DNA was ligated to form multimers,which were then subjected to amplification, fragmentation, and analysis.FIGS. 25A and 25B illustrate an example of this process. An illustrativeworkflow is provided in FIG. 31 .

gDNA from a melanoma cell line SK-mel-28 (ATCC) containing BRAF V600Emutation was mixed with reference gDNA (12878 Coriell Institute) toachieve 1% BRAF V600E. DNA was sonicated as in Example 14 to obtain afragment size of approximately 150 bp. 100 ng of DNA in 11 μl volume wasdenature at 95° C. for 30 sec. 10.5 μl H2O, 2.5 μl ligase buffer (NEBB0202S), and 1 μl T4 Polynucleotide Kinase (NEB M0201S) were added,followed by incubation at 37° C. for 30 minutes to phosphorylate theDNA.

Samples were purified with Oligo Clean & Concentrator (Zymo Research).This included adding binding mix (25 μl of 10 mM Tris, 100 μl of Oligobinding buffer, 400 μl of 100% Ethanol) to the ligation reaction afterExonuclase treatment. This was mixed by vortex and spun briefly. AZymo-spin column was loaded, and spun at greater than 10,000×g for 30seconds, washed with 750 μl of DNA wash buffer, and centrifuged at14,000×g for 1 minute. DNA was eluted with 15 μl of 10 mM Tris bycentrifugation at greater than 10,000×g for 30 seconds.

To ligate, 6 ng DNA in 4 μl volume, was mixed with 0.45 μl 10× endrepair buffer (Enyzymatics), 0.05 μl dNTP 25 mM, 0.5 μl ATP 10 mM, Endrepair enzyme mix (Enyzymatics), and T4 ligase 2000 unit/μl. Thereaction was incubated at 25° C. for 30 minutes, and followed by 75° C.for 20 minutes.

Whole genome amplification was performed with Ready-To-Go Genomiphi V3DNA amplification Kit (GE Healthcare). 8 μl of H₂O and 10 μl of purifiedligation were mixed with 10 μl of 2× denaturation buffer. DNA wasdenatured at 95° C. for 3 minutes, and then cooled to 4° C. on ice. 20μl of denatured DNA was added to WGA pre-mix, and incubated at 30° C.for 1.5 hours followed by inactivation at 65° C. for 10 minutes.

The amplification reaction was then cleaned up using Agencourt AMPure XPPurification (1.6×) (Beckman Coulter). 30 μl of 10 mM Tris and 80 μl ofAMpure beads were added to 20 μl of WGA reaction. This was incubated atroom temperature for 2 minutes. The tube was placed on a magnet stand,and incubated for 2 minutes. Supernatant was removed and discarded.Beads were washed with 200 μl of ethanol (80%) twice, then air dried for5 minutes. DNA was eluted with 200 μl of 10 mM Tris pH 8.0. 130 μl WGAproduct was then fragmented using the Covaris 5220 sonicator to obtain afragment size of approximately 400 bp (Covaris 5220 settings: peakincident power=140 W, duty factor=10%, cycles per burst=200, treatmenttime=55 seconds).

Mutations were detected by ddPCR using BioRad Prime PCR ddPCR mutationdetection assays. Mutation-detection ddPCR reaction was assembled in aPCR tube at room temperature (80 ng of amplified DNA, 10 μl of 2×ddPCRsupermix for probes, 1 μl of 20× target (BRAF V600E, BioRad) primers (9μM)/probe (FAM; 5 μM), 1 μl of 20× wild-type primers (9 μM)/probe (HEX;5 μM), 8 μl of DNA sample (50 ng). The reaction was mixed by pipettingup and down 5 times, and then transferred to droplet generatorcartridge. Droplets were generated using the QX200 droplet generator,transferred into a 96-well PCR plate, and amplified using the followingPCR program: 95° C., 10 minutes; 40 cycles of [94° C., 30 seconds, 55°C. 1 minute]; 98° C., 10 minutes. PCR reaction plate was transferredinto a QX200 droplet reader to quantify the result. Based on the inputDNA, the expected frequency of the BRAF V600E mutation was 1%. By thisligation and amplification procedure, this frequency was roughlymaintained (1.41% according to the ddPCR analysis) while the quantity ofDNA increased about 200-fold.

FIGS. 26 and 27 illustrate exemplary variations on the process of FIG.25 .

Example 15: Detection of Rare Mutations in a Single Reaction Assay

The ability to detect rare mutations in circulating cell-free DNA usingNGS is limited by two factors: first is accuracy of the sequencingtechnology; second is the overall conversion rate of the assay.Described herein is a method for sensitive detection of rare variants ina targeted region using a single reaction assay workflow and analgorithm for accurate variant calling. An exemplary workflow for asingle reaction assay is provided in FIGS. 34A and 34B.

Linear single stranded polynucleotides are circularized by end-joining.Where single-stranded circles are desired, the polynucleotide may be asingle-stranded polynucleotide as originally isolated, or may be treatedto render the polynucleotide single-stranded (e.g. by denaturation). Inthis example, a method for circularizing a polynucleotide involves anenzyme, such as use of a ligase (e.g., an RNA ligase or a DNA ligase).Reaction conditions are those specified by the manufacturer of theselected enzyme. Joining the ends of a polynucleotide to form a circularpolynucleotide (either directly to itself or to one or more otherpolynucleotides, e.g., a circular target polynucleotide comprising twotarget polynucleotides) produces a junction having a junction sequence.

After circularization, an exonuclease step is included to digest anyunligated nucleic acids after the circularization reaction. That is,closed circles do not contain a free 5′ or 3′ end, and thus theintroduction of a 5′ or 3′ exonuclease will not digest the closedcircles but will digest the unligated components. In some cases, thisfinds particular use in multiplex systems.

After circularization, ligase may remain bound on the ligated molecules.The presence of ligase at the ligation junction may block primerextension during polymerization reaction and reduce amplificationefficiency (as illustrated in FIG. 34A). Removal of ligase used in thecircularization reaction by degradation allows polymerase to extendthrough the junction and amplify the circularized target molecules asconcatemers effectively. Degradation of ligase comprises treatment witha heat-labile protease, such as Qiagen protease, which can behead-inactivated by incubation at 70° C. for 15 minutes.

After removal of ligase, circularized DNA molecules can be amplified byrandom primers for whole genome amplification or specific primers fortargeted amplification. The amplified concatemers will be randomlysonicated to create fragments at size range ˜500 bp-1000 bp for NGSlibrary construction and sequencing.

Improvements to the algorithm in this example are illustrated in FIG. 35. The junctions (characterized by the start and end positions in thereference) in the concatemer sequences can be identified throughalignment to reference sequences and can be used as tags for theoriginal input DNA fragments. However, the number of j unction types(combinations of the start and ends in reference) may be limited in agiven targeted region; and when there are a large number of input DNAfragments present, there is relatively high chance of collision in thejunction from two independent input DNA fragments. Consequently, the“junction tags” may not be treated as a unique tag in many applications,and error correction and molecular counting based on this kind of tagsmay not be uniquely distinguishing.

By combining concatemer shredding points with the junction in theconcatemer sequences, more distinct tags can be created for the inputmolecules. Illustrated in FIG. 35 , a long string of concatemers, whichcan be generated after circle-ligation and RCA, is shredded usingsonication or other methods of fragmentation (e.g., enzymatic cleavage)to form shorter concatemers. These results in many concatemers withdifferent structures: for example, different numbers of repeats ordifferent starting/ending positions in reads relative to thejunctions—the concatemer shredding points at two ends.

By incorporating the concatemer shredding points to create distincttags, it will be possible to effectively perform additional errorcorrection based on the consensus sequences built from the read familiesidentified by the distinct tags. Various voting schemes are used tobuild consensus sequences from the families of reads and to do variantcalling, e.g. a variant is called only if all the reads in the familyreport the same variant. Second, the distinct tags can be used to helpcount input molecules and compute allele frequencies.

An example algorithm to identify the concatemer shredding points are asfollows. 1) identify in the concatemer sequences (reads), the repeatlength, repeat regions and junctions, which can be done withself-alignment, alignment to reference sequences or other computationalmethods; 2) determine the positions of the junctions within the reads byaligning the read to the reference sequence; 3) calculate the positionsof the concatemer shredding points (at both ends) by shifting positionfrom the reference start or end positions of the junction while thenumber of bases to shift is determined by the read positions of theadjacent junctions.

Example 16: qPCR Analysis of Ligase Treated cfDNA Molecules with orwithout Ligase Removal

Ligase treated cfDNA samples circularized and purified under variousconditions were analyzed by qPCR. Three purification conditions wereincluded in the test: 1. Circ-ligated DNA purified by chromatographycolumns; 2. Circ-ligated DNA purified by phenol chloroform; and 3.Circ-ligated DNA treated with proteinase K to remove ligase beforepurification by chromatography column. 10 ng of cfDNA was used for eachcondition (1, 2, and 3). cfDNA was denatured at 96° C. for 30 seconds,and chilled on an ice block for 2 minutes followed by addition ofligation mix (2 μL of 10× Circligase buffer, 4 μL of 5M Betaine, 1 μL of50 mM MnCl₂, 1 μL of Circligase II (Epicentre #CL9025K). A ‘no ligasecontrol’ (4) and a ‘no DNA control’ (5) were setup at the same time. The‘no ligase control’ contained 10 ng of cfDNA mixed with 2 μL of 10×Circligase buffer, 4 μL of 5M Betaine, 1 μL of 50 mM MnCl₂ but noligase. The ‘no DNA control’ contained all the ligation reagents but nocfDNA. All reactions were incubated at 60° C. for 1 hour.

After ligation, the ‘no ligase control’ (4), ‘no DNA control’ (5) andcondition 1 (Circ-ligated DNA purified by chromatography columns) werepurified by chromatography columns. Each reaction was loaded on a microbio-spin P-6 column pre-washed with 10 mM Tris-Cl pH 8.0 and eluted bycentrifuge the columns for 4 minutes at 1,000×(g).

Condition 2 (Circ-ligated DNA purified by phenol chloroform) waspurified by phenol chloroform extraction and ethanol precipitation 180μL of 10 mM Tris was added to 20 μL of DNA from the exonucleasetreatment to make a volume of 200 μL, and 200 μL of phenol was used toextract DNA. The aqueous layer was collected, and the DNA was recoveredby ethanol precipitation. The ethanol coprecipitant mix (200 μL of DNAsolution after phenol extraction, 20 μL sodium acetate 0.5M pH 5.2, 1 μLGlycoBlue, 1 μL carrier oligo (100 ng/μL), 600 μL of 100% ethanol wasincubated at −80° C. for 30 minutes, and centrifuged at 16k rpm for 5minutes to precipitate the DNA. The DNA pellet was washed with 500 μL of70% ethanol. The DNA pellet was air dried for 5 minutes at roomtemperature, and resuspended with 11 μL 10 mM Tris-Cl pH 8.0.

Condition 3 (Circ-ligated DNA treated with proteinase K) was firsttreated with proteinase K at 37° C. for 30 minutes, followed bypurification using a chromatography column. The reaction was loaded on amicro bio-spin P-6 column pre-washed with 10 mM Tris-Cl pH 8.0 andeluted by centrifuge the columns for 4 minutes at 1,000×(g).

Products from each condition (1-5) were then analyzed by quantitativePCR (qPCR). 3 replicate qPCR reactions were setup for each sample andaverage Ct was calculated. As shown in Table 10, qPCR amplification ofno DNA control (5) gave an average Ct (cycle threshold) of 40, no ligasecontrol (4) product gave an average Ct of 30.79, indicating a highrecovery of the input DNA under a condition without ligase treatment.Conditions 1 and 2 gave average Ct at 34.80 and 35.18 respectively.Comparing to the no ligase control (4), conditions 1 and 2 showedsignificant loss of amplifiable DNA after ligase treatment, even thoughboth products were purified to remove free enzyme. Products fromcondition 3, in which ligase was removed through enzyme degradationbefore purification, showed an average Ct of 30.42, which was comparableto no ligase control (4), indicating that removal of ligase is criticalfor efficient recovery of amplifiable DNA.

TABLE 10 Sample Avg Ct 1. Ligated DNA purified by column 34.80 2.Ligated DNA purified by phenol-chloroform extraction 35.18 3. LigatedDNA treated with protease and purified by column 30.42 4. No LigaseControl DNA purified by column 30.79 5. No DNA Control 40.00

Example 17: Comparison of NGS Library Complexity

This example compares complexity of NGS libraries prepared by differentworkflows. Workflows compared in this example include:

-   -   1. Library prepared through circularization and purification        through a phenol chlorophorm extraction step before        amplification;    -   2. Library prepared through circularization and direct        amplification without purification;    -   3. Library prepared through circularization and protease        treatment for 15 minutes, followed by amplification without        purification;    -   4. Library prepared through circularization and protease        treatment for 20 minutes, followed by amplification without        purification;    -   5. Library prepared through circularization and protease        treatment for 30 minutes, followed by amplification without        purification;    -   6. Library prepared through circularization and protease        treatment for 60 minutes, followed by amplification without        purification;

For each condition, 12 μL of 20 ng cfDNA was used as input for libraryconstruction. DNA samples were denatured at 96° C. for 30 seconds, andchilled on an ice block for 2 minutes. The addition of ligation mix (12μL cfDNA, 2 μL of 10× Circligase buffer, 4 μL 5M Betaine, 1 μL of 50 mMMnCl₂, 1 μL of Circligase II (Epicentre #CL9025K) was set up on a coolblock, and ligation was performed at 60° C. for 3 hours. Ligation DNAmixture was incubated at 80° C. for 45 seconds on a PCR machine,followed by an Exonuclease treatment. 1 μL Exo nuclease mix (ExoI 20U/μL: ExoIII 100 U/μL=1:2) was added to each tube, and reactions wereincubated at 37° C. for 30 minutes.

For workflow 1, ligation product was phenol chloroform extracted andprecipitated with salt and ethanol. 180 μL of 10 mM Tris was added to204 of DNA from the exonuclease treatment to make a volume of 200 μL,and 200 μL of phenol was used to extract DNA. The aqueous layer wascollected, and the DNA was recovered by ethanol precipitation. Theethanol co-precipitant mix (200 μL of DNA solution after phenolextraction, 204 sodium acetate 0.5M pH 5.2, 1 μL GlycoBlue, 1 μL carrieroligo (100 ng/μL), 600 μl of 100% ethanol was incubated at −80° C. for30 minutes, and centrifuged at 16k rpm for 5 minutes to precipitate theDNA. The DNA pellet was washed with 500 μL of 70% ethanol.

The DNA pellet was air dried for 5 minutes at room temperature, andresuspended with 11 μL of 10 mM Tris-Cl pH 8.0. For conducting wholegenome amplification (WGA), purified DNA was first denatured at 65° C.for 5 minutes. 10 μL of denature buffer from GE WGA kit was added to 10μL of purified DNA. The DNA was cooled on a cool block or ice for 2minutes. 20 μL of DNA was added to the Ready-To-Go GenomiPhi V3 cake(WGA). The WGA reaction was incubated at 30° C. for 1.5 hours, followedby heat inactivation at 65° C. for 10 minutes.

For workflow 2, 0.12 μL of 0.5M EDTA and 0.58 μL of 1M KCl were added tothe exonuclease treated ligation product and mixed well. Ligation mixwas denatured at 95° C. for 2 minutes and cooled to 4° C. on ice beforeadded to the Ready-To-Go GenomiPhi V3 cake (WGA). The WGA reaction wasincubated at 30° C. for 4.5 hours, followed by heat inactivation at 65°C. for 10 minutes.

For workflows 3-6, the exonuclease treated ligation products were firsttreated with protease to remove circligase II. 1 μL of serine proteasewas added to each reaction and incubation time at 55° C. was titratedfrom 15 minutes to 60 minutes (condition 3: 15 minutes; condition 4: 20minutes; condition 5: 30 minutes; condition 6: 60 minutes), followed byheat inactivation at 70° C. for 15 minutes, and 0.12 μL of 0.5M EDTA and0.58 μL of 1M KCl were added to the treated ligation product. Ligationmix was then denatured at 95° C. for 2 minutes and cooled to 4° C. onice before added to the Ready-To-Go GenomiPhi V3 cake (WGA). The WGAreaction was incubated at 30° C. for 4.5 hours, followed by heatinactivation at 65° C. for 10 minutes.

For all conditions, WGA products were bead purification using AmpureXPmagnetic beads and sonicated to average size of 800 bp. The sonicatedDNA samples were then used as input for standard sequencing libraryconstruction using KAPA library preparation kit. Libraries were thensequenced by Illumina HiSeq2500 and library complexity was evaluated bycalculating the number of unique molecules detected in each library.Comparison of library complexities are shown in the figure below, withthe number of unique molecules scaled on a relative basis. Workflow 1showed lowest complexity; workflow 2 had slightly higher librarycomplexity than workflow 1, indicating less molecule loss by skippingpurification; removal of ligase (workflows 3-6) significantly increasedthe number of unique molecules in the libraries, with longer proteasetreatment time lead to more unique molecules detected. The increase ofmolecules detected started to plateau after 30 minutes incubation inthis experiment. The results are presented in FIG. 38 .

Example 18: Variant Calling

This example evaluates variant calling using methods described hereinwherein a sequence variant occurring in at least two different shearedpolynucleotides is identified as a true sequence variant.

Genomic DNA from 9 cell lines were fragmented to an average size of ˜150bp and mixed to produce a DNA mix. The DNA mix covers 8 cancer hotspots(Table 11) at 0.1% allele frequency (AF). 12 μL of 20 ng the DNA mix wasused as input for library construction for each reaction. DNA sampleswere denatured at 96° C. for 30 seconds, and chilled on an ice block for2 minutes. After the addition of ligation mix (12 μL cfDNA, 2 μL of 10×Circligase buffer, 4 μL 5M Betaine, 1 μL of 50 mM MnCl₂, 1 μL ofCircligase II (Epicentre #CL9025K)), the reaction was set up on a coolblock, and ligation was performed at 60° C. for 3 hours. Ligation DNAmixture was incubated at 80° C. for 45 seconds on a PCR machine,followed by an Exonuclease treatment. 1 μL Exo nuclease mix (ExoI 20U/μL: ExoIII 100 U/μL=1:2) was added to each tube, and reactions wereincubated at 37° C. for 30 minutes. Circligase II was removed byprotease treatment and 0.12 μL of 0.5M EDTA/0.58 μL of 1M KCl was addedto the reactions. Ligation mix was then denatured at 95° C. for 2minutes and cooled to 4° C. on ice before adding to the Ready-To-GoGenomiPhi V3 cake (whole genome amplification, WGA). The WGA reactionwas incubated at 30° C. for 4.5 hours, followed by heat inactivation at65° C. for 10 minutes.

WGA products were bead purified using AmpureXP magnetic beads andsonicated to an average size of ˜800 bp. The sonicated DNA samples werethen used as input for standard sequencing library construction usingKAPA library preparation kit. As illustrated in FIG. 39 , sonication orother methods of fragmentation (e.g., enzymatic cleavage) can be used togenerate shorter concatemers from a long string of concatemers resultingfrom RCA. The resulting concatemers can have a variety of structures.The shorter concatemers formed may have different numbers of repeats, orcopies of the input DNA sequence, and/or different starting/endingpositions in reads relative to the junctions (e.g., concatemer shreddingpoints). In some cases, concatemers having a variety of structures areproduced by random priming (FIG. 40 ).

Libraries were sequenced by Illunima HiSeq2500 and sequencing reads weresubjected to sequence analysis. In sequence analysis, the concatemershredding points can be combined with the junction sequences ofsequencing reads to create unique tags that can be associated withsequencing reads. Consensus sequences built from read familiesidentified by the unique tags can be used to perform additional errorcorrection.

Various voting schemes may be used to build consensus sequences from thefamilies of reads and to perform variant calling. A consensus, forexample, can be built based on the majority of the reads in the familyreporting the same variant. Sequence differences can be identifiedbetween a read family consensus and a reference sequence, and a variantis called, in some cases, when the same sequence difference occurs in atleast two different read families. The unique tags can also be used tohelp count input molecules and compute allele frequencies.

In an exemplary algorithm to identify the concatemer shredding pointsand perform variant calling,

-   -   1) Identify in the concatemer sequences (read) the repeat        length, repeat regions and junctions, for example, by        self-alignment, alignment to reference sequences or other        computational methods.    -   2) Determine the positions of the junctions within the reads by        aligning the read to the reference sequence.    -   3) Calculate the positions of the concatemer shredding points        (at both ends) by shifting position from the reference start or        end positions of the junction while the number of bases to shift        is determined by the read positions of the adjacent junctions.    -   4) Group reads based on their shredding points in combination        with their “j unction tag” (e.g., unique tag), and create        different read families of reads, each of which has a unique        tag.    -   5) If a majority of the reads in the read family vote        unanimously for the variants, that is, all the reads report the        variant with concatemer confirmation, this family will be        counted as a variant family (FIG. 41A); otherwise this family        will not be counted as a variant family (FIG. 41B).    -   6) The number of the variant families can be used to determine        whether an input molecule identified by “junction tag” can be        counted as “variant molecule,” where various cutoff may apply.        For example, in FIG. 41C, a variant is called when the same        sequence difference (start) occurred in at least two different        read families. In some cases, the number of “variant molecules”        can be used to determine whether a variant should be called and        can be used to calculated allele frequency.

To determine the improvements to variant detection using the methodsdescribed herein, the number of variants identified by calling asequence difference detected in sequencing reads as the variant when thesequence difference occurs in a majority of the sequencing reads from afirst sheared polynucleotide and a majority of sequencing reads from asecond sheared polynucleotide was compared to variants identified bymethods without such requirements. As shown in Table 12, the requirementof a sequence variant occurring in at least two different shearedpolynucleotides reduced non-specific false positive variants by 23.39%while none of the true positive calls were removed (0% of the specifictrue variants was removed), indicating a significant improvement ofspecificity without affecting sensitivity.

TABLE 11 List of caner hotspots in the mixed cell line DNA Gene VariantsPIK3CA H1047R KRAS G12D EGFR L747-E749delA750P EGFR T790M EGFR L858RNRAS Q61R BRAF V600E EGFR G719S

TABLE 12 Variant detection using methods described herein # # spike-inspike-in cancer cancer total total % non- hotspots hotspots % variantsvariants specific detected detected specific before after variantsbefore after variants filtering filtering removed filtering filteringremoved Replicate 1260 945 25.00% 6 6 0.00% 1 Replicate 1194 934 21.78%7 7 0.00% 2 Average 1227 939.5 23.39% 6.5 6.5 0.00%

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. It is intended thatthe following claims define the scope of the invention and that methodsand structures within the scope of these claims and their equivalents becovered thereby.

1.-22. (canceled)
 23. A method of identifying a sequence variant in anucleic acid sample comprising a plurality of polynucleotides, eachpolynucleotide of the plurality having a 5′ end and a 3′ end, the methodcomprising: (a) circularizing individual polynucleotides of saidplurality to form a plurality of circular polynucleotides, each of whichhaving a junction between the 5′ end and 3′ end; (b) amplifying thecircular polynucleotides of (a) to produce amplified polynucleotides;(c) shearing the amplified polynucleotides to produce shearedpolynucleotides, each sheared polynucleotide comprising one or moreshear points at a 5′ end and/or a 3′ end; (d) sequencing the shearedpolynucleotides to produce a plurality of sequencing reads; (e)identifying sequence differences between sequencing reads and areference sequence; and (f) calling a sequence difference as thesequence variant when the sequence difference occurs in at least twodifferent sheared polynucleotides.
 24. The method of claim 23, whereincalling the sequence difference as the sequence variant occurs furtherwhen (i) the sequence difference is identified on both strands of adouble-stranded input molecule; and/or (ii) the sequence differenceoccurs in a consensus sequence for a concatemer formed by amplificationcomprising rolling circle amplification.
 25. The method of claim 23,wherein the plurality of polynucleotides is single-stranded.
 26. Themethod of claim 23, wherein circularizing is effected by subjecting theplurality of polynucleotides to a ligation reaction.
 27. The method ofclaim 23, wherein the sequence variant is a single nucleotidepolymorphism.
 28. The method of claim 23, wherein the reference sequenceis a consensus sequence formed by aligning the sequencing reads with oneanother.
 29. The method of claim 23, wherein the reference sequence is asequencing read.
 30. The method of claim 23, wherein circularizingcomprises the step of joining an adapter polynucleotide to the 5′ end,the 3′ end, or both the 5′ end and the 3′ end of a polynucleotide in theplurality of polynucleotides.
 31. The method of claim 23, whereinamplifying is effected by using a polymerase having strand-displacementactivity.
 32. The method of claim 23, wherein amplifying comprisessubjecting the circular polynucleotides to an amplification reactionmixture comprising random primers.
 33. The method of claim 23, whereinamplifying comprises subjecting the circular polynucleotides to anamplification reaction mixture comprising one or more primers, each ofwhich specifically hybridizes to a different target sequence viasequence complementarity.
 34. The method of claim 23, wherein theamplified polynucleotides are subjected to the sequencing step withoutenrichment.
 35. The method of claim 23, further comprising enriching oneor more target polynucleotides among the amplified polynucleotides byperforming an enrichment step prior to sequencing.
 36. (canceled) 37.The method of claim 23, wherein the sample is a sample from a subject.38. The method of claim 37, wherein the sample is urine, stool, blood,saliva, tissue, or bodily fluid.
 39. The method of claim 37, wherein thesample comprises tumor cells.
 40. The method of claim 37, wherein thesample is a formalin-fixed paraffin embedded (FFPE) sample.
 41. Themethod of claim 37, further comprising diagnosing, and optionallytreating, said subject based on the calling step.
 42. The method ofclaim 23, wherein the sequence variant is a causal genetic variant. 43.The method of claim 23, wherein the sequence variant is associated witha type or stage of cancer.
 44. The method of claim 23, wherein theplurality of polynucleotides comprises cell-free polynucleotides. 45.The method of claim 44, wherein the cell-free polynucleotides comprisecirculating tumor DNA. 46.-103. (canceled)